Duality in
Structured and Federated Optimization
Zhenan Fan
Microsoft Research Asia
November 22th, 2022
Outline
1
Duality
in
Optimization
Structured
Optimization
Federated
Learning
2
3
Duality in Optimization


Primal and Dual
Optimization is everywhere
-
machine learning
-
signal processing
-
data mining
Primal problem
Dual problem
Weak duality
(always hold)
Strong duality
(Under some domain qualification)


Dual Optimization
Possible advantages
-
parallelizable [Boyd et al.'11] -
better convergence rate [Shalev-Shwartz & Zhang.'13] -
smaller dimension [Friedlander & Macêdo'16]
Possible dual formulations
-
Fenchel-Rockafellar dual [Rockafellar'70] -
Lagrangian dual [Boyd & Vandenberghe'04] -
Gauge dual [Friedlander, Macêdo & Pong'04]
(All these different dual formulations can be intepreted using the perturbation framework proposed by [Rockafellar & Wets'98])
Structured Optimization


Structured Data-Fitting
Atomic decomposition: mathematical modelling for structure
[Chen, Donoho & Sauders'01; Chandrasekaran et al.'12]
linear map
observation
sparse low-rank smooth
variables
cardinality
weight
atom
atomic set
- sparse n-vectors
-
low-rank matrices

Example: Separating Stars and Galaxy

[Chen, Donoho & Sauders'98; Donoho & Huo'01]

Example: Separating Chessboard and Chess

[Chandrasekaran et al.'09; Candès et al.'09]

Example: Multiscale Low-rank Decomposition

[Ong & Lustig'16]

Roadmap
Convex relaxation with guarantee
Primal-dual relationship and dual-based algorithm
Efficient primal-retrieval strategy
Fan, Z., Jeong, H., Joshi, B., & Friedlander, M. P. Polar Deconvolution of Mixed Signals. IEEE Transactions on Signal Processing (2021).
Fan, Z., Jeong, H., Sun, Y., & Friedlander, M. P. Atomic decomposition via polar alignment: The geometry of structured optimization. Foundations and Trends® in Optimization (2020).
Fan, Z., Fang, H. & Friedlander, M. P. Cardinality-constrained structured data-fitting problems. To appear in Open Journal of Mathematical Optimization (2022).

Convex Relaxation
Gauge function: sparsity-inducing regularizer [Chandrasekaran et al.'12]
Examples
- sparse n-vectors
-
low-rank matrices
Structured convex optimization [FJJF, IEEE-TSP'21]
Minimizing gauge function can promote atomic sparsity!
structure assumption
data-fitting constraint

Recovery Guarantee
Theorem [FJJF, IEEE-TSP'21]
If the ground-truth signals are incoherent and the measurement are gaussian, then with high probability


Primal-dual Correspondence
Primal problem
Dual problem
Theorem [FSJF, FNT-OPT'21]
Let
and
denote optimal primal and dual solutions. Under mild assumptions,

Dual-based Algorithm

(Projection can be computed approximately using Frank-Wolfe.)
Complexity
projection steps
or
Frank-Wolfe steps
A variant of the level-set method developed by [Aravkin et al.'18]

Primal-retrieval Strategy
Can we retrieve primal variables from near-optimal dual variable?
Theorem [FFF, Submitted'22]
Let
denote the duality gap. Under mild assumptions,
Open-source Package https://github.com/MPF-Optimization-Laboratory/AtomicOpt.jl
(equivalent to unconstrained least square when atomic sets are symmetric)
Federated Learning


Motivation
Setting
Definition
Federated learning is a collaborative learning framework that can keep data sets private.

Decentralized data sets, privacy concerns

Horizontal and Vertical Federated Learning


Roadmap
Federated optimization
Fan, Z., Fang, H. & Friedlander, M. P. FedDCD: A Dual Approach for Federated Learning. Submitted (2022).
Knowledge-injected federated learning
Fan, Z., Zhou, Z., Pei, J., Friedlander, M. P., Hu, J., Li, C. & Zhang, Y. Knowledge-Injected Federated Learning. Submitted (2022).
Contribution valuation in federated learning
Fan, Z., Fang, H., Zhou, Z., Pei, J., Friedlander, M. P., Liu, C., & Zhang, Y. Improving Fairness for Data Valuation in Horizontal Federated Learning. IEEE International Conference on Data Engineering (ICDE 2022).
Fan, Z., Fang, H., Zhou, Z., Pei, J., Friedlander, M. P., & Zhang, Y. Fair and efficient contribution valuation for vertical federated learning. Submitted (2022).

Federated Optimization
Important features of federated optimization
-
communication efficiency
-
data privacy
-
data heterogeneity
-
computational constraints
model
number of clients
local dataset
loss function

Primal-based Algorithm

FedAvg [McMahan et al.'17]
SCAFFOLD [Karimireddy et al.'20]

Dual-based Algorithm
Federated dual coordinate descent (FedDCD) [FFF, Submitted'22]
Each selected client approximately compute dual gradient and upload to server
Server adjusts the gradients (to keep feasibility) and broadcasts to selected clients
Each selected client locally updates the dual model
(A extension of [Necoara et al.'17]: inexact gradient, acceleration)
conjugate function
local dual model

Communication Rounds

Setting

Open-source Package https://github.com/ZhenanFanUBC/FedDCD.jl

Knoledge-Injected Federated Learning

Coal-Mixing in Coking Process

Challenging as no direct formula
Based on experience and knowledge
largely affects cost

Task Description
Goal: improve the expert's prediction model with machine learning
Data scarcity: collecting data is expensive and time consuming
We unite 4 coking industries to collaboratively work on this task
Challeges
local datasets have different distributions
industries have different expert(knowledge) models
privacy of local datasets and knowledge models has to be preserved

Multiclass Classification
training set
data instance
(features of raw coal)
feature space
label
(quality of the final coke)
label space
data distribution
Task
Setting

Knowledge-based Models
Prediction-type Knowledge Model (P-KM)
Range-type Knowledge Model (R-KM)
Eg. Mechanistic prediction models, such as an differential equation that describes the underlying physical process.
Eg. Can be derived from the causality of the input-output relationship.

Federated Learning with Knowledge-based Models
M clients and a central server.
conditional data distribution depending on
Each client m has


Task Description
each client m obatins a personalized predictive model
Design a federated learning framework such that
clients can benefit from others' datasets and knowledge
privacy of local datasets and local KMs needs to be protected

Direct Formulation Invokes Infinitely Many Constraints
Simple setting
Challenging optimization problem

Architecture Design
The server provides a general deep learning model
learnable model parameters

Function transformation
where
Personalized model

Properties of Personalized Model

Optimization
Optimization problem
Eg. FedAvg [McMahan et al.'17]
global loss
local loss

Most existing horizontal federated learning algorithms can be applied to solve this optimization problem!
Numerical Results (Case-study)

Test accuracy
Percentage of violation

Open-source Package https: //github.com/ZhenanFanUBC/FedMech.jl

Contribution Valuation in Federated Learning

Key requirement
1. Data owners with similar data should receive similar valuation. 2. Data owners with unrelated data should receive low valuation.

Shapley Value
Shapley value is a measure for players' contribution in a game.
Advantage
It satisfies many desired fairness axioms.
Drawback
Computing utilities requires retraining the model.
performance of the model
player i
utility created by players in S
marginal utility gain

Horizontal Federated Learning

model
number of clients
local dataset
loss function

Federated Shapley Value
[Wang et al.'20] propose to compute Shapley value in each communication round, which eliminates the requirement of retraining the model.
Fairness
Symmetry
Zero contribution
Addivity

Utility Function
Test data set (server)
Problem: In round t, the server only has
[Wang et al.'20]

Possible Unfairness
Clients with identical local datasets may receive very different valuations.

Same local datasets
Relative difference
Empirical probability

Low Rank Utility Matrix
Utility matrix
This matrix is only partially observed and we can do fair valuation if we can recover the missing values.
Theorem
If the loss function is smooth and strong convex, then
[Fan et al.'22]
[Udell & Townsend'19]

Empirical Results: Singular Value Decomposition


Matrix Completion

Same local datasets
Relative difference
Empirical CDF


Vertical Federated Learning
local models
local embeddings
Only embeddings will be communicated between server and clients.


FedBCD
[Liu et al.'22]
Server selects a mini-batch
Each client m compute local embeddings
Server computes gradient
Each client m updates local model

Utility Function
Problem: In round t, the server only has
Embedding matrix
Theorem
If the loss function is smooth, then
[Fan et al.'22]

Empirical Results: Approximate Rank


Experiment: Detection of Artificial Clients

These two works are partly done during my internship at Huawei Canada. Our code is publicly available at Huawei AI Gallery.

Thank you! Questions?
Research Summary
By Zhenan Fan
Research Summary
Slides for a summary of PhD works.
- 218