Min-Hsiu Hsieh

Hon Hai (Foxconn) Quantum Computing Research Center

Challenge and Opportunity in Quantum Machine Learning

PME Special Quantum Seminar

Quantum Machine Learning

+

Why Quantum Computing?

Approximating the Jones polynomial is "BQP-complete".

Vaughan Jones - 1990 Fields Medal

- Aharonov, Jones, Landau, STOC 2006.

Why Machine Learning?

f: X\to Y

Unknown Function

\{(x_i,y_i)\}_{i=1}^N

Training Data

\mathcal{H}

Hypothesis Set

Learning

Algorithm

\hat{f}

Comp. Complexity

Sample Complexity

f: X\to Y

Unknown Function

\{(x_i,y_i)\}_{i=1}^N

Training Data

\mathcal{H}

Hypothesis Set

Learning

Algorithm

\hat{f}

Comp. Complexity

Sample Complexity

Quantum Ingredients

Quantum Advantage

Type of Input

Type of Algorithms

CQ
CC
QC
QQ
CQ
QQ
QC
  • Linear Equation Solvers

  • Peceptron

  • Recommendation Systems

  • Semidefinite Programming

  • Many Others (such as non-Convex Optimization)

  • State Tomography

  • Entanglement Structure

  • Quantum Control

Could QML achieve better end-to-end runtime?

QML Process

1. Readin

2. Readout

Many Challenges!

3.

Learning Machines

4. Noise

QRAM

- V. Giovannetti, S. Lloyd, L. Maccone, Phys. Rev. Lett. 100, 160501 (2008).

1.

Readin

Input Oracles for distribution

[1] Aleksandrs Belovs, Quantum Algorithms for Classical Probability Distributions, 27th annual European symposium on algorithms (esa 2019), 2019, pp. 16:1–16:11.
O_p |0\rangle =\sum_{x\in\mathcal{X}} \sqrt{p_x} |x\rangle_A\otimes|\phi_x\rangle_B
O_p |0\rangle =\sum_{x\in\mathcal{X}} \sqrt{p_x} |x\rangle
O_{s} |0\rangle =\sum_{x\in\mathcal{X}} n^{-1/2} |x\rangle \otimes |\#_s(x)\rangle

1.

Readin

There is no general readin protocol (with runtime guarantee) for arbitrary datasets.

1.

Readin

2.

Readout

\text{require } O(\frac{rd}{\epsilon^2}) \text{ copies of } \rho.

State tomography:

- \ r \text{ is the rank of } \rho.
- \ d \text{ is the dimension.}

Observation:

For ML problems, input and output have certain relationships.

- A \bm{x} =\bm{b}.
- A = \sum_{i} \sigma_i \bm{u}_i \bm{v}^\dagger_i.

2.

Readout

[1] Efficient State Read-out for Quantum Machine Learning Algorithms. Kaining Zhang, Min-Hsiu Hsieh, Liu Liu, Dacheng Tao. Physical Review Research 3, 04395 (2021). [arXiv:2004.06421] 

2.

Readout

poly(\(r,\epsilon^{-1}\)) query to QRAM.

Theorem:

Given:

\(-\) Input \(A\in\mathbb{R}^{m\times n}\) of rank \(r\)

\(-\) Output \( \bm{v} \in\text{row}(A)\)

\(-\) access to QRAM

Proof:

1. \(|v\rangle = \sum_{i=1}^r x_i |A_{g(i)}\rangle\in\text{row}(A)\)

2. quantum Gram-Schmidt Process algorithm to construct \(\{A_{g(i)}\}\)

3. Obtain \(\{x_i\}\).

2.

Readout

[1] Efficient State Read-out for Quantum Machine Learning Algorithms. Kaining Zhang, Min-Hsiu Hsieh, Liu Liu, Dacheng Tao. Physical Review Research 3, 04395 (2021). [arXiv:2004.06421] 

3.

Learning
Machine

Expressivity
Trainability
Generalization

Learning

Model

"how the architectural properties of a neural network (depth, width, layer type) affect the resulting functions it can compute"

[1] On the Expressive Power of Deep Neural  Networks. (ICML2017) arXiv:1606.05336

3.1

Expressivity

\(\geq\)

[1] Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Dacheng Tao. The Expressive Power of Parameterized Quantum Circuits. Physical Review Research 2, 033125 (2020) [arXiv:1810.11922].

3.1

Expressivity

\(\leq\)

\(\geq\)

"How easy is it to find the appropriate weights of the neural networks that fit the given data?"

3.2

Trainability

3.2

Trainability

Barren Plateau problem:

\mathbb{E}_{\theta}\|\triangledown_{\mathbf{\theta}} f \|^ 2 =\epsilon \leq 2^{-\text{poly}(n)}
\text{where } f(\mathbf{\theta},\rho) =\text{Tr}[O U(\theta)\rho U(\theta)^\dagger].
[1] Jarrod R McClean, Sergio Boixo, Vadim N Smelyanskiy, Ryan Babbush, and Hartmut Neven. Barren plateaus in quantum neural network training landscapes. Nature communications, 9(1):1– 6, 2018. 

3.2

Trainability

Known BP Results

3.2

Trainability

Bad News for QML

  • Flat loss landscape.

  • Extremely small toleration to noise.

3.2

Trainability

Contribution 1:

BP free architecture

\mathbb{E}_{\theta}\|\triangledown_{\mathbf{\theta}} f \|^ 2 \geq c(\rho_{in}) 2^{-3LS}
\text{where } {\bf{\theta}} \sim [0,2\pi].
[1] Kaining Zhang, Min-Hsiu Hsieh, Liu Liu, Dacheng Tao. Toward Trainability of Deep Quantum Neural Networks. [arXiv:2112.15002]

3.2

Trainability

Binary classification on the wine dataset (N=13)

[1] Kaining Zhang, Min-Hsiu Hsieh, Liu Liu, Dacheng Tao. Toward Trainability of Deep Quantum Neural Networks. [arXiv:2112.15002]

3.2

Trainability

Contribution 2:

Initialization Matters

\mathbb{E}_{\bf{\theta}}\|\triangledown_{\bf{\theta}} f \|^ 2 \geq c(\rho_{in}) \text{poly}(L)^{-1}
\text{where } {\bf{\theta}} \sim N(0, \frac{1}{4S(L+2)}).
[1] Kaining Zhang, Min-Hsiu Hsieh, Liu Liu, Dacheng Tao. Submitted to NIPS 2022.

3.2

Trainability

Finding the ground energy of the Ising model (N=15, L=10)

[1] Kaining Zhang, Min-Hsiu Hsieh, Liu Liu, Dacheng Tao. Submitted to NIPS 2022.
[1] ​Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Shan You, Dacheng Tao. On the learnability of quantum neural networks. PRX-Quantum 2, 040337 (2021)[arXiv:2007.12369]
\bm{\theta}^*= \arg \min_{\bm{\theta}\in\mathcal{C}} \mathcal{L}(\bm{\theta},\bm{z})
\mathcal{L}(\bm{\theta}):= \frac{1}{n}\sum_{j=1}^n \ell(y_i, \hat{y}_i) + r(\bm{\theta})
R_1\left(\bm{\theta}^{(T)}\right) := \mathbb{E} \left\|\nabla \mathcal{L}(\bm{\theta}^{(T)})\right\|^2
R_2\left(\bm{\theta}^{(T)}\right) := \mathbb{E}[\mathcal{L}(\bm{\theta}^{(T)})] - \mathcal{L}(\bm{\theta}^*)

3.2

Trainability

Contribution 3:

Trainability in ERM

R_1\left(\bm{\theta}^{(T)}\right) := \mathbb{E} \left\|\nabla \mathcal{L}(\bm{\theta}^{(T)})\right\|^2
R_1 \leq \tilde{O}\left(poly\left(\frac{d}{T(1-p)^{L_Q}}, \frac{d}{BK(1-p)^{L_Q}} \right) \right)

\(d\)= \(|\bm{\theta}|\)

\(T\)= # of iteration

\(L_Q\)= circuit depth

\(p\)= error rate

\(K\)= # of measurements

3.2

Trainability

[1] ​Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Shan You, Dacheng Tao. On the learnability of quantum neural networks. PRX-Quantum 2, 040337 (2021)[arXiv:2007.12369]
R_2\left(\bm{\theta}^{(T)}\right) := \mathbb{E}[\mathcal{L}(\bm{\theta}^{(T)})] - \mathcal{L}(\bm{\theta}^*)
R_2\leq \tilde{O}\left( poly\left(\frac{d}{K^2B (1-p)^{L_Q}} ,\frac{d}{(1-p)^{L_Q}}\right) \right)

\(d\)= \(|\bm{\theta}|\)

\(T\)= # of iteration

\(L_Q\)= circuit depth

\(p\)= error rate

\(K\)= # of measurements

3.2

Trainability

[1] ​Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Shan You, Dacheng Tao. On the learnability of quantum neural networks. PRX-Quantum 2, 040337 (2021)[arXiv:2007.12369]

3.3

Generalization

"Generalization refers to the model's ability to adapt properly to new, previously unseen data, drawn from the same distribution as the one used to train the model."

[1] ​S. Arunachalam, A. B. Grilo, and H. Yuen, arXiv:2002.08240 (2020).

3.3

Generalization

Separation between Learning models

3.3

Generalization

Contribution:

"Noisy QNN can efficiently simulate QSQ oracle."

[1] ​Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Shan You, Dacheng Tao. On the learnability of quantum neural networks. PRX-Quantum 2, 040337 (2021)[arXiv:2007.12369]

經典: 1 error per 6 month in a 128MB PC100 SDRAM  (2009)
量子: 1 error per second per qubit (2021)

4

Noise

(\bm{\theta}^*,\bm{a}^*)= \arg \min_{\bm{\theta}\in\mathcal{C},\bm{a}\in\mathcal{A}} \mathcal{L}(\bm{\theta},\bm{a}, \mathcal{E}_{\bm{a}})

\(\mathcal{C}\): The collection of all parameters

\(\mathcal{A}\): The collection of all possible circuits

\(\mathcal{E}_{\bm{a}}\): The error for the architecture \(\bm{a}\)

[1] Yuxuan Du, Tao Huang, Shan You, Min-Hsiu Hsieh, Dacheng Tao. Quantum circuit architecture search: error mitigation and trainability enhancement for variational quantum solvers. arXiv:2010.10217 (2020).

4.1

Error Mitigation

4.1

Error Mitigation

[1] Yuxuan Du, Tao Huang, Shan You, Min-Hsiu Hsieh, Dacheng Tao. Quantum circuit architecture search: error mitigation and trainability enhancement for variational quantum solvers. arXiv:2010.10217 (2020).

Hydrogen Simulation+EM

4.1

Error Mitigation

[1] Yuxuan Du, Tao Huang, Shan You, Min-Hsiu Hsieh, Dacheng Tao. Quantum circuit architecture search: error mitigation and trainability enhancement for variational quantum solvers. arXiv:2010.10217 (2020).

Could noise become useful in QML?

YES!

4.2

Harnessing Noise

4.2

Harnessing Noise

[Robustness]

[Privacy]

4.2.1

Providing Privacy

Differential Privacy (DP)

Classical DP is well studied; however, Quantum DP is not.

​[1] Li Zhou and Mingsheng Ying. Differential privacy in quantum computation. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 249–262. IEEE, 2017. 
[2] Scott Aaronson and Guy N Rothblum. Gentle measurement of quantum states and differential privacy. Proceedings of ACM STOC‘2019.

4.2.1

Providing Privacy

 Regression+DP

[1] ​​Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Shan You, Dacheng Tao. Quantum differentially private sparse regression learning. arXiv:2007.11921 (2020)

1. The first quantum DP algorithm.

2. Have the same privacy guarantee with the best classical DP algorithm.

3. Huge runtime improvement. 

Contribution:

[1] ​​Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Shan You, Dacheng Tao. Quantum differentially private sparse regression learning. arXiv:2007.11921 (2020)

4.2.1

Providing Privacy

Adversarial Attack

[1] Lu et.al, “Quantum Adversarial Machine Learning". [arXiv:2001.00030]

4.2.2

Robustness

Adversarial Robustness

4.2.2

Robustness

2. Depolarizing noise suffices.

Contribution:

1. Explicit relation between p and \(\tau\).

[1]Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Dacheng Tao, Nana Liu. Quantum noise protects quantum classifiers against adversaries. Physical Review Research 3, 023153 (2021). [arXiv:2003.09416].

4.2.2

Robustness

Thank you for your attention!

Variational Quantum Circuits

By Lawrence Min-Hsiu Hsieh

Variational Quantum Circuits

AQIS 2021

  • 96