Min-Hsiu Hsieh

Hon Hai (Foxconn) Quantum Computing Research Center

Challenge and Opportunity in Quantum Machine Learning

PME Special Quantum Seminar

Quantum Machine Learning

+

Why Quantum Computing?

Approximating the Jones polynomial is "BQP-complete".

Vaughan Jones - 1990 Fields Medal

- Aharonov, Jones, Landau, STOC 2006.

Why Machine Learning?

f: X\to Y

Unknown Function

\{(x_i,y_i)\}_{i=1}^N

Training Data

\mathcal{H}

Hypothesis Set

Learning

Algorithm

\hat{f}

Comp. Complexity

Sample Complexity

f: X\to Y

Unknown Function

\{(x_i,y_i)\}_{i=1}^N

Training Data

\mathcal{H}

Hypothesis Set

Learning

Algorithm

\hat{f}

Comp. Complexity

Sample Complexity

Quantum Ingredients

Quantum Advantage

Type of Input

Type of Algorithms

CQ

CC

QC

QQ

CQ

QQ

QC

Linear Equation Solvers

Peceptron

Recommendation Systems

Semidefinite Programming

Many Others (such as non-Convex Optimization)

State Tomography

Entanglement Structure

Quantum Control

Could QML achieve better end-to-end runtime?

QML Process

1. Readin

2. Readout

Many Challenges!

3. Learning Machines

4. Noise

QRAM

- V. Giovannetti, S. Lloyd, L. Maccone, Phys. Rev. Lett. 100, 160501 (2008).

1. Readin

Input Oracles for distribution

[1] Aleksandrs Belovs, Quantum Algorithms for Classical Probability Distributions, 27th annual European symposium on algorithms (esa 2019), 2019, pp. 16:1–16:11.

O_p |0\rangle =\sum_{x\in\mathcal{X}} \sqrt{p_x} |x\rangle_A\otimes|\phi_x\rangle_B

O_p |0\rangle =\sum_{x\in\mathcal{X}} \sqrt{p_x} |x\rangle

O_{s} |0\rangle =\sum_{x\in\mathcal{X}} n^{-1/2} |x\rangle \otimes |\#_s(x)\rangle

1. Readin

There is no general readin protocol (with runtime guarantee) for arbitrary datasets.

1. Readin

2. Readout

\text{require } O(\frac{rd}{\epsilon^2}) \text{ copies of } \rho.

State tomography:

- \ r \text{ is the rank of } \rho.

- \ d \text{ is the dimension.}

Observation:

For ML problems, input and output have certain relationships.

- A \bm{x} =\bm{b}.

- A = \sum_{i} \sigma_i \bm{u}_i \bm{v}^\dagger_i.

2. Readout

[1] Efficient State Read-out for Quantum Machine Learning Algorithms. Kaining Zhang, Min-Hsiu Hsieh, Liu Liu, Dacheng Tao. Physical Review Research 3, 04395 (2021). [arXiv:2004.06421]

2. Readout

poly(\(r,\epsilon^{-1}\)) query to QRAM.

Theorem:

Given:

\(-\) Input \(A\in\mathbb{R}^{m\times n}\) of rank \(r\)

\(-\) Output \( \bm{v} \in\text{row}(A)\)

\(-\) access to QRAM

Proof:

1. \(|v\rangle = \sum_{i=1}^r x_i |A_{g(i)}\rangle\in\text{row}(A)\)

2. quantum Gram-Schmidt Process algorithm to construct \(\{A_{g(i)}\}\)

3. Obtain \(\{x_i\}\).

2. Readout

[1] Efficient State Read-out for Quantum Machine Learning Algorithms. Kaining Zhang, Min-Hsiu Hsieh, Liu Liu, Dacheng Tao. Physical Review Research 3, 04395 (2021). [arXiv:2004.06421]

3. Learning
Machine

Expressivity

Trainability

Generalization

Learning

Model

"how the architectural properties of a neural network (depth, width, layer type) affect the resulting functions it can compute"

[1] On the Expressive Power of Deep Neural Networks. (ICML2017) arXiv:1606.05336

3.1

Expressivity

\(\geq\)

[1] Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Dacheng Tao. The Expressive Power of Parameterized Quantum Circuits. Physical Review Research 2, 033125 (2020) [arXiv:1810.11922].

3.1

Expressivity

\(\leq\)

\(\geq\)

"How easy is it to find the appropriate weights of the neural networks that fit the given data?"

3.2

Trainability

3.2

Trainability

Barren Plateau problem:

\mathbb{E}_{\theta}\|\triangledown_{\mathbf{\theta}} f \|^ 2 =\epsilon \leq 2^{-\text{poly}(n)}

\text{where } f(\mathbf{\theta},\rho) =\text{Tr}[O U(\theta)\rho U(\theta)^\dagger].

[1] Jarrod R McClean, Sergio Boixo, Vadim N Smelyanskiy, Ryan Babbush, and Hartmut Neven. Barren plateaus in quantum neural network training landscapes. Nature communications, 9(1):1– 6, 2018.

3.2

Trainability

Known BP Results

3.2

Trainability

Bad News for QML

Flat loss landscape.

Extremely small toleration to noise.

3.2

Trainability

Contribution 1:

BP free architecture

\mathbb{E}_{\theta}\|\triangledown_{\mathbf{\theta}} f \|^ 2 \geq c(\rho_{in}) 2^{-3LS}

\text{where } {\bf{\theta}} \sim [0,2\pi].

[1] Kaining Zhang, Min-Hsiu Hsieh, Liu Liu, Dacheng Tao. Toward Trainability of Deep Quantum Neural Networks. [arXiv:2112.15002]

3.2

Trainability

Binary classification on the wine dataset (N=13)

[1] Kaining Zhang, Min-Hsiu Hsieh, Liu Liu, Dacheng Tao. Toward Trainability of Deep Quantum Neural Networks. [arXiv:2112.15002]

3.2

Trainability

Contribution 2:

Initialization Matters

\mathbb{E}_{\bf{\theta}}\|\triangledown_{\bf{\theta}} f \|^ 2 \geq c(\rho_{in}) \text{poly}(L)^{-1}

\text{where } {\bf{\theta}} \sim N(0, \frac{1}{4S(L+2)}).

[1] Kaining Zhang, Min-Hsiu Hsieh, Liu Liu, Dacheng Tao. Submitted to NIPS 2022.

3.2

Trainability

Finding the ground energy of the Ising model (N=15, L=10)

[1] Kaining Zhang, Min-Hsiu Hsieh, Liu Liu, Dacheng Tao. Submitted to NIPS 2022.

[1] Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Shan You, Dacheng Tao. On the learnability of quantum neural networks. PRX-Quantum 2, 040337 (2021)[arXiv:2007.12369]

\bm{\theta}^*= \arg \min_{\bm{\theta}\in\mathcal{C}} \mathcal{L}(\bm{\theta},\bm{z})

\mathcal{L}(\bm{\theta}):= \frac{1}{n}\sum_{j=1}^n \ell(y_i, \hat{y}_i) + r(\bm{\theta})

R_1\left(\bm{\theta}^{(T)}\right) := \mathbb{E} \left\|\nabla \mathcal{L}(\bm{\theta}^{(T)})\right\|^2

R_2\left(\bm{\theta}^{(T)}\right) := \mathbb{E}[\mathcal{L}(\bm{\theta}^{(T)})] - \mathcal{L}(\bm{\theta}^*)

3.2

Trainability

Contribution 3:

Trainability in ERM

R_1\left(\bm{\theta}^{(T)}\right) := \mathbb{E} \left\|\nabla \mathcal{L}(\bm{\theta}^{(T)})\right\|^2

R_1 \leq \tilde{O}\left(poly\left(\frac{d}{T(1-p)^{L_Q}}, \frac{d}{BK(1-p)^{L_Q}} \right) \right)

\(d\)＝ \(|\bm{\theta}|\)

\(T\)＝ # of iteration

\(L_Q\)＝ circuit depth

\(p\)＝ error rate

\(K\)＝ # of measurements

3.2

Trainability

[1] Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Shan You, Dacheng Tao. On the learnability of quantum neural networks. PRX-Quantum 2, 040337 (2021)[arXiv:2007.12369]

R_2\left(\bm{\theta}^{(T)}\right) := \mathbb{E}[\mathcal{L}(\bm{\theta}^{(T)})] - \mathcal{L}(\bm{\theta}^*)

R_2\leq \tilde{O}\left( poly\left(\frac{d}{K^2B (1-p)^{L_Q}} ,\frac{d}{(1-p)^{L_Q}}\right) \right)

\(d\)＝ \(|\bm{\theta}|\)

\(T\)＝ # of iteration

\(L_Q\)＝ circuit depth

\(p\)＝ error rate

\(K\)＝ # of measurements

3.2

Trainability

[1] Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Shan You, Dacheng Tao. On the learnability of quantum neural networks. PRX-Quantum 2, 040337 (2021)[arXiv:2007.12369]

3.3

Generalization

"Generalization refers to the model's ability to adapt properly to new, previously unseen data, drawn from the same distribution as the one used to train the model."

[1] S. Arunachalam, A. B. Grilo, and H. Yuen, arXiv:2002.08240 (2020).

3.3

Generalization

Separation between Learning models

3.3

Generalization

Contribution:

"Noisy QNN can efficiently simulate QSQ oracle."

[1] Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Shan You, Dacheng Tao. On the learnability of quantum neural networks. PRX-Quantum 2, 040337 (2021)[arXiv:2007.12369]

經典: 1 error per 6 month in a 128MB PC100 SDRAM (2009)
量子: 1 error per second per qubit (2021)

4

Noise

(\bm{\theta}^*,\bm{a}^*)= \arg \min_{\bm{\theta}\in\mathcal{C},\bm{a}\in\mathcal{A}} \mathcal{L}(\bm{\theta},\bm{a}, \mathcal{E}_{\bm{a}})

\(\mathcal{C}\): The collection of all parameters

\(\mathcal{A}\): The collection of all possible circuits

\(\mathcal{E}_{\bm{a}}\): The error for the architecture \(\bm{a}\)

[1] Yuxuan Du, Tao Huang, Shan You, Min-Hsiu Hsieh, Dacheng Tao. Quantum circuit architecture search: error mitigation and trainability enhancement for variational quantum solvers. arXiv:2010.10217 (2020).

4.1

Error Mitigation

4.1

Error Mitigation

[1] Yuxuan Du, Tao Huang, Shan You, Min-Hsiu Hsieh, Dacheng Tao. Quantum circuit architecture search: error mitigation and trainability enhancement for variational quantum solvers. arXiv:2010.10217 (2020).

Hydrogen Simulation+EM

4.1

Error Mitigation

[1] Yuxuan Du, Tao Huang, Shan You, Min-Hsiu Hsieh, Dacheng Tao. Quantum circuit architecture search: error mitigation and trainability enhancement for variational quantum solvers. arXiv:2010.10217 (2020).

Could noise become useful in QML?

YES!

4.2

Harnessing Noise

4.2

Harnessing Noise

[Robustness]

[Privacy]

4.2.1

Providing Privacy

Differential Privacy (DP)

Classical DP is well studied; however, Quantum DP is not.

[1] Li Zhou and Mingsheng Ying. Differential privacy in quantum computation. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 249–262. IEEE, 2017.

[2] Scott Aaronson and Guy N Rothblum. Gentle measurement of quantum states and differential privacy. Proceedings of ACM STOC‘2019.

4.2.1

Providing Privacy

Regression+DP

[1] Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Shan You, Dacheng Tao. Quantum differentially private sparse regression learning. arXiv:2007.11921 (2020)

1. The first quantum DP algorithm.

2. Have the same privacy guarantee with the best classical DP algorithm.

3. Huge runtime improvement.

Contribution:

[1] Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Shan You, Dacheng Tao. Quantum differentially private sparse regression learning. arXiv:2007.11921 (2020)

4.2.1

Providing Privacy

Adversarial Attack

[1] Lu et.al, “Quantum Adversarial Machine Learning". [arXiv:2001.00030]

4.2.2

Robustness

Adversarial Robustness

4.2.2

Robustness

2. Depolarizing noise suffices.

Contribution:

1. Explicit relation between p and \(\tau\).

[1]Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Dacheng Tao, Nana Liu. Quantum noise protects quantum classifiers against adversaries. Physical Review Research 3, 023153 (2021). [arXiv:2003.09416].

Min-Hsiu Hsieh

Hon Hai (Foxconn) Quantum Computing Research Center

Challenge and Opportunity in Quantum Machine Learning

PME Special Quantum Seminar

Quantum Machine Learning

+

Why Quantum Computing?

Approximating the Jones polynomial is "BQP-complete".

Why Machine Learning?

Type of Input

Type of Algorithms

Linear Equation Solvers

Peceptron

Recommendation Systems

Semidefinite Programming

Many Others (such as non-Convex Optimization)

State Tomography

Entanglement Structure

Quantum Control

Could QML achieve better end-to-end runtime?

QML Process

1. Readin

2. Readout

Many Challenges!

3.

Learning Machines

4. Noise

QRAM

1.

Readin

Input Oracles for distribution

1.

Readin

There is no general readin protocol (with runtime guarantee) for arbitrary datasets.

1.

Readin

2.

Readout

State tomography:

Observation:

For ML problems, input and output have certain relationships.

2.

Readout

2.

Readout

poly(\(r,\epsilon^{-1}\)) query to QRAM.

Theorem:

Given:

\(-\) Input \(A\in\mathbb{R}^{m\times n}\) of rank \(r\)

\(-\) Output \( \bm{v} \in\text{row}(A)\)

\(-\) access to QRAM

Proof:

1. \(|v\rangle = \sum_{i=1}^r x_i |A_{g(i)}\rangle\in\text{row}(A)\)

2. quantum Gram-Schmidt Process algorithm to construct \(\{A_{g(i)}\}\)

3. Obtain \(\{x_i\}\).

2.

Readout

3.

Learning Machine

Learning

Model

"how the architectural properties of a neural network (depth, width, layer type) affect the resulting functions it can compute"

3.1

\(\geq\)

3.1

\(\leq\)

\(\geq\)

"How easy is it to find the appropriate weights of the neural networks that fit the given data?"

3.2

3.2

Barren Plateau problem:

3.2

Known BP Results

3.2

Bad News for QML

​Flat loss landscape.

Extremely small toleration to noise.

3.2

Contribution 1:

BP free architecture

Learning
Machine

Flat loss landscape.