Study notes on Federated Learning: Opportunities and Challenges

Grimmer Kang

The main advantage is FL (Federated Learning) is Privacy. The training data is not uploaded to the aggregation server and only model parameters are uploaded.

Training-related challenges

Communication Overheads. Solutions:
- Compression
- Only send the relative part, ref1, ref2 (e.g. some local model updates are biased and simply outliner, can be skipped).
Data Heterogeneity: it is about Non-iid (independent and identically distributed) data. iid data is for the traditional distribution system. It could be solved by an extreme communication cost but is not practical. So the below solution could be used.
- FedAvg (Federated averaging). Averaging the model weights. Uploading weights instead of gradients can save communication overhead as well.
System Heterogeneity: it is about different hardware clients, scalability, and production quality system design. Below solutions both apply FedAvg and secure (parameter) aggregation/computation which is introduced in the next section.
- TOWARDS FEDERATED LEARNING AT SCALE: SYSTEM DESIGN on mobile devices, TensorFlow
- Privacy-preserving Traffic Flow Prediction: A Federated Learning Approach PySyft

Privacy & Security concerns

Membership Inference Attack: it's about privacy. Defenses:
- Secure Computation: Homomorphic Encryption, Secure Multiparty Computation (SMC), etc.
- Differential Privacy (DP)
- Trusted Execution Environment.
Data Poisoning & model Attacks (clients are malicious and trying to reduce model accuracy):
- The most common defense measures are rejections based on Error rate and Loss function <= Try to detect & remove unreasonable model parameters updates.
Backdoor Attacks (clients are malicious and mislabel specific tasks). Defenses:
- Participant level differential privacy
- norm thresholding of updates (detect & remove unreasonable updates)

FINAL REMARKS: Is it possible to develop a byzantine tolerant FL model while ensuring user privacy using schemes with low computational cost?

Homomorphic Encryption

from NVIDIA ref

ref2: The aggregator randomly selects a client as the leader who generates an HE key-pair and synchronizes it to all the other clients

ref3: Distributed Additive Encryption and Quantization for Privacy Preserving Federated Deep Learning (to solve the little risk in ref2 which needs a server to broadcast HE key-pair)

DP (Differential Privacy):

In differential privacy schemes, the contribution of a user is masked by adding noise to the clipped model parameters before model aggregation. Some amount of model accuracy will be lost by adding noise to the parameters.

From wiki

A simple example, especially developed in the social sciences, is to ask a person to answer the question "Do you own the attribute A?", according to the following procedure:

Toss a coin.
If heads, then toss the coin again (ignoring the outcome), and answer the question honestly.
If tails, then toss the coin again and answer "Yes" if heads, "No" if tails.

these data with many responses are significant

Notes

Extra issues not explicitly mentioned in this paper:
- Lack of access to global training data makes it harder to identify unwanted biases entering the training wiki ref. The root cause is iid-data. FedAvg may or may not encounter this seriously. Other algorithms can be considered, https://arxiv.org/pdf/1907.01132.pdf, or ref2 in page3
Open source Federated Learning frameworks supporting DP: PySyft (even supplies Homomorphic Encryption) and Flower.