Grimmer Kang
The main advantage is FL (Federated Learning) is Privacy. The training data is not uploaded to the aggregation server and only model parameters are uploaded.
Data Heterogeneity: it is about Non-iid (independent and identically distributed) data. iid data is for the traditional distribution system. It could be solved by an extreme communication cost but is not practical. So the below solution could be used.
FedAvg (Federated averaging). Averaging the model weights. Uploading weights instead of gradients can save communication overhead as well.
System Heterogeneity: it is about different hardware clients, scalability, and production quality system design. Below solutions both apply FedAvg and secure (parameter) aggregation/computation which is introduced in the next section.
TOWARDS FEDERATED LEARNING AT SCALE: SYSTEM DESIGN on mobile devices, TensorFlow
Privacy-preserving Traffic Flow Prediction: A Federated Learning Approach PySyft
Membership Inference Attack: it's about privacy. Defenses:
Secure Computation: Homomorphic Encryption, Secure Multiparty Computation (SMC), etc.
Differential Privacy (DP)
Trusted Execution Environment.
Data Poisoning & model Attacks (clients are malicious and trying to reduce model accuracy):
The most common defense measures are rejections based on Error rate and Loss function <= Try to detect & remove unreasonable model parameters updates.
Backdoor Attacks (clients are malicious and mislabel specific tasks). Defenses:
Participant level differential privacy
norm thresholding of updates (detect & remove unreasonable updates)
FINAL REMARKS: Is it possible to develop a byzantine tolerant FL model while ensuring user privacy using schemes with low computational cost?
from NVIDIA ref
ref2: The aggregator randomly selects a client as the leader who generates an HE key-pair and synchronizes it to all the other clients
ref3: Distributed Additive Encryption and Quantization for Privacy Preserving Federated Deep Learning (to solve the little risk in ref2 which needs a server to broadcast HE key-pair)
In differential privacy schemes, the contribution of a user is masked by adding noise to the clipped model parameters before model aggregation. Some amount of model accuracy will be lost by adding noise to the parameters.
From wiki
A simple example, especially developed in the social sciences, is to ask a person to answer the question "Do you own the attribute A?", according to the following procedure:
Toss a coin.
If heads, then toss the coin again (ignoring the outcome), and answer the question honestly.
If tails, then toss the coin again and answer "Yes" if heads, "No" if tails.
these data with many responses are significant
Extra issues not explicitly mentioned in this paper:
Lack of access to global training data makes it harder to identify unwanted biases entering the training wiki ref. The root cause is iid-data. FedAvg may or may not encounter this seriously. Other algorithms can be considered, https://arxiv.org/pdf/1907.01132.pdf, or ref2 in page3
Open source Federated Learning frameworks supporting DP: PySyft (even supplies Homomorphic Encryption) and Flower.