Contribution Valuation

In

Federated Learning

Zhenan Fan

Department of Computer Science

 

Collaborators:

Huang Fang, Zirui Zhou, Yong Zhang, Jian Pei, Michael Friedlander

Contribution Valuation

Key requirement 
1. Data owners with similar data should receive similar valuation.
2. Data owners with unrelated data should receive low valuation.

Shapley Value

Shapley value is a measure for players' contribution in a game. 
Advantage 
It satisfies many desired fairness axioms. 
Drawback 
Computing utilities requires retraining the model. 
performance of the model
v(\red{i}) = \frac{1}{N} \sum\limits_{S \subseteq [N] \setminus \{i\}} \frac{1}{ N-1 \choose |S|} [U(S \cup \{i\}) - \blue{U(S)}]
player i
utility created by players in S
marginal utility gain

Federated Shapley Value

[Wang et al.'20] propose to compute Shapley value in each communication round, which eliminates the requirement of retraining the model. 
v_t(i) = \frac{1}{M} \sum\limits_{S \subseteq [M] \setminus \{i\}} \frac{1}{ M-1 \choose |S|} [U_t(S \cup \{i\}) - U_t(S)]
v(i) = \sum\limits_{t=1}^T v_t(i)
Fairness
Symmetry
U_t(S\cup\{i\}) = U_t(S\cup\{j\}) \quad \forall t, S \Rightarrow v(i) = v(j)
Zero contribution
U_t(S\cup\{i\}) = U_t(S) \quad \forall t, S \Rightarrow v(i) = 0
Addivity
U_t = U^1_t + U^2_t \Rightarrow v(i) = v^1(i) + v^2(i)

Horizontal Federated Learning

\mathop{min}\limits_{w \in \mathbb{R}^d}\enspace F(\red{w}) \coloneqq \sum\limits_{i=1}^\blue{M} f_i(w) \enspace\text{with}\enspace f_i(w) \coloneqq \frac{1}{|\mathcal{D}_i|} \sum\limits_{(x, y) \in \green{\mathcal{D}_i} } \purple{\ell}(w; x, y)
model
number of clients
local dataset
loss function

FedAvg 

[McMahan et al.'17]
w_i^{t+1} \leftarrow w_i^t - \eta \tilde \nabla f_i(w_i^t) \quad (K \text{ times})
w^{t+1} \leftarrow \frac{1}{|S^t|}\sum_{i \in S^t} w_i^{t+1}

Utility Function

Test data set (server)
\mathcal{D}_c
U_t(S) = \sum\limits_{(x, y) \in \mathcal{D}_c} \left[ \ell(w^t; x,y) - \ell(w_S^{t+1}; x,y) \right] \enspace\text{where}\enspace w_S^{t+1} = \frac{1}{|S|} \sum\limits_{i \in S} w_i^{t+1}
v_t(i) = \begin{cases} \frac{1}{|S^t|} \sum\limits_{S \subseteq S^t \setminus\{i\}} \frac{1}{\binom{|S^t|-1}{|S|}} \left[U_t(S\cup\{i\}) - U_t(S)\right] & i \in S^t \\ 0 & i \notin S^t \end{cases}
Problem: In round t, the server only has  
\{w_i^{t+1}\}_{i \in S^t}
[Wang et al.'20] 

Possible Unfairness

Clients with identical local datasets may receive very different valuations.   
Same local datasets
\mathcal{D}_i = \mathcal{D}_j
Relative difference
d_{i,j} = \frac{|v(i) - v(j)|}{\max\{v(i), v(j)\}}
Empirical probability
\mathbb{P}( d_{i,j} > 0.5) > 65\% \quad \red{\text{unfair!}}

Low Rank Utility Matrix

Utility matrix
\mathcal{U} \in \mathbb{R}^{T \times 2^M} \enspace\text{with}\enspace \mathcal{U}_{t, S} = U_t(S)
This matrix is only partially observed and we can do fair valuation if we can recover the missing values. 

Theorem
If the loss function is smooth and strong convex, then
\red{\mathop{rank}_\epsilon}(\mathcal{U}) \in \mathcal{O}(\frac{\log(T)}{\epsilon})
[Fan et al.'22] 
\red{ \mathop{rank}_\epsilon(X) = \min\{\mathop{rank}(Z) \mid \|Z - X\|_{\max} \leq \epsilon\} }
[Udell & Townsend'19] 

Empirical Results: Singular Value Decomposition

Matrix Completion

\min\limits_{\substack{W \in \mathbb{R}^{T \times r}\\ H \in \mathbb{R}^{2^N \times r}}} \enspace \sum_{t=1}^T\sum_{S\subseteq S^t} (\mathcal{U}_{t,S} - w_t^Th_{S})^2 + \lambda(\|W\|_F^2 + \|H\|_F^2)
Same local datasets
\mathcal{D}_i = \mathcal{D}_j
Relative difference
d_{i,j} = \frac{|v(i) - v(j)|}{\max\{v(i), v(j)\}}
Empirical CDF
\mathbb{P}( d_{i,j} < t)
\mathop{min}\limits_{\theta_1, \dots, \theta_M}\enspace F(\red{\theta_1, \dots, \theta_M}) \coloneqq \frac{1}{N}\sum\limits_{i=1}^{\red{N}} \ell(\sum_{m=1}^M h^m_i; y_i) \enspace\text{with}\enspace \blue{h^m_i} = \langle \theta_m, x_i^m \rangle
local models
local embeddings
number of training samples
Only embeddings will be communicated between server and clients.   

FedBCD

[Liu et al.'22]
Server selects a mini-batch
B^t \subseteq [N]
Each client m compute local embeddings 
\{ (h_i^m)^t = \langle \theta_m^t, x_i^m \rangle \mid i \in B^t \}
Server computes gradient
\{g_i^t = \frac{\partial \ell(h_i^t; y_i)}{\partial h_i^t} \mid i \in B^t\}
Each client m updates local model 
\theta_m^{t+1} \leftarrow \theta_m^t - \frac{\eta^t}{|B^t|} \sum\limits_{i \in B^t} g_i^t x_i^m

Utility Function

U_t(S) = \frac{1}{N}\sum\limits_{i=1}^N \ell\bigg(\sum\limits_{m=1}^M (h^m_i)^{t-1}; y_i\bigg) - \frac{1}{N}\sum\limits_{i=1}^N \ell\bigg(\sum\limits_{m\in S} (h^m_i)^{t} + \sum\limits_{m\notin S} (h^m_i)^{t-1}; y_i\bigg)
Problem: In round t, the server only has  
\{ (h_i^m)^t \mid i \in B^t \}
Embedding matrix
\mathcal{H}^m \in \mathbb{R}^{T \times N} \enspace\text{with}\enspace \mathcal{H}^m_{t, i} = (h_i^m)^t
Theorem
If the loss function is smooth, then
\mathop{rank}_\epsilon(\mathcal{H}^m) \in \mathcal{O}(\frac{\log(T)}{\epsilon})
[Fan et al.'22] 

Contribution Valuation

By Zhenan Fan

Contribution Valuation

Slides for my PhD defence.

  • 86