Learning Quantum Objects

1st International Workshop on

Quantum Software and Quantum Machine Learning (QSML)

Min-Hsiu Hsieh

UTS: Centre for Quantum Software and Information

Title Text

This talk concerns

QIP 2018 Tutorial 

Ronald de Wolf | CWI, University of Amsterdam
Title: Quantum Learning Theory

  • Complexity of Learning
  • Full Quantum Settings

Hao-Chung Cheng, MH, Ping-Cheng Yeh. The learnability of unknown quantum measurements. QIC 16(7&8):615–656 (2016).

f: X\to Y

Unknown Function

\{(x_i,y_i)\}_{i=1}^N

Training Data

\mathcal{H}

Hypothesis Set

Learning

Algorithm

\hat{f}

Comp. Complexity

Sample Complexity

f_\rho : \mathcal{E}(\mathcal{H}) \to \mathbb{R}
f_\rho(E) = \text{Tr} E\rho

Hypothesis Set

\{f_{\rho}:\rho\in \mathcal{D}(\mathcal{H})\}
\{(E_i,f_\rho(E_i)\}_{i=1}^N

Training Data

Unknown Function

f_\rho : \mathcal{E}(\mathcal{H}) \to \mathbb{R}

Photo Credit: Akram Youssry

\{(x_i,y_i)\}_{i=1}^N

Training Data

R_n(h) = \frac{1}{N}\sum_{i=1}^N \ell (h(x_i), y_i)
h\in\mathcal{H}

Hypothesis Set

f: X\to Y

Unknown Function

Given a loss function 

\ell:Y\times Y \to \mathbb{R}

find

{f}_n = \arg \min_{h\in \mathcal{H}} R_n (h)

where

Empirical Risk Minimization

Out-of-Sample Error

R(h) = \mathbb{E}_{X\sim\mu} [\ell (h(X), Y) ]

In-Sample Error

R_n(h) = \frac{1}{N}\sum_{i=1}^N \ell (h(x_i), y_i)
\{(x_i,y_i)\}_{i=1}^N

Training Data

h\in\mathcal{H}

Hypothesis Set

f: X\to Y

Unknown Function

|R(f_n) - R_n(f_n)| \leq Bound (n, \mathcal{H} )
\leq \sup_{h\in\mathcal{H}}|R(h) - R_n(h)|

if for any \(\epsilon>0\)

Probably Approximately Correct (PAC) Learnable

\(\mathcal{H}\) is PAC learnable

$$ \lim_{n\to\infty}\sup_{\mu} \Pr\{\sup_{h\in\mathcal{H}}|R(h) - R_n(h)| >\epsilon\} = 0$$

Sample Complexity

\(\sup_{\mu} \Pr \left\{ \sup_{h\in\mathcal{H}} \big|R(h)-R_n(h)\big|\geq \epsilon \right\}\leq \delta\)

Sample complexity \(m_\mathcal{H}(\epsilon,\delta)\) is the first quantity such that

 for every \(n\geq m_\mathcal{H}(\epsilon,\delta),\)

 \(m_{\mathcal{H}}(\epsilon,\delta)= \frac{C}{\epsilon^2}\left(\text{VCdim}(\mathcal{H})\log\left(\frac{2}{\epsilon}\right)+\log\left(\frac{2}{\delta}\right)\right)\)

For Boolean functions \(\mathcal{H}\)  

[1] Vapnik, Springer-Verlag, New York/Berlin, 1982.

[2] Blumer, Ehrenfeucht, Haussler, and Warmuth, Assoc. Comput. Machine, vol. 36, no. 4, pp. 151--160, 1989.

\(\mathcal{X}=\mathbb{R}^2\), \(\mathcal{H}=\{f:\mathcal{X}\to\{0,1\},linear\}\)

VC Dimension

For Real functions \(\mathcal{H}\)  

fat\(_\mathcal{H}(\epsilon,\mathcal{X})=\sup\{|\mathcal{S}|: \mathcal{S}\) is \(\epsilon\)-shattered by \(\mathcal{H}\}\)

\(\mathcal{H}\) \(\epsilon\)-Shatters \(\mathcal{S}=\{x_1,\cdots,x_n\}\) if

For Real functions \(\mathcal{H}\)  

\(m_{\mathcal{H}}= \frac{C}{\epsilon^2}\left(\text{fat}_{\mathcal{H}}(\frac{\epsilon}{8})\cdot\log(\frac2\epsilon)+\log(8/\delta)\right)\)

[1] Bartlett, Long, and Williamson, J. Comput. System Sci., vol. 52, no. 3, pp. 434--452, 1996.

[2] Alon, Ben-David, Cesa-Bianchi, and Haussler, J. ACM, vol. 44, no. 4, pp. 616--631, 1997.

[3] Mendelson, Inventiones Mathematicae, vol. 152, pp. 37--55, 2003.

Sample Complexity for Learning Quantum Objects

Q. State

Measurement

f_\rho : \mathcal{E}(\mathcal{H}) \to \mathbb{R}
f_\rho(E) = \text{Tr} E\rho

Hypothesis Set

\{f_{\rho}:\rho\in \mathcal{D}(\mathcal{H})\}
\{(E_i,f_\rho(E_i)\}_{i=1}^N

Training Data

Unknown Function

f_\rho : \mathcal{E}(\mathcal{H}) \to \mathbb{R}
f_E: \mathcal{D}(\mathcal{H}) \to \mathbb{R}
f_E(\rho) = \text{Tr} E\rho

Hypothesis Set

\{f_{E}:E\in \mathcal{E}(\mathcal{H})\}
\{(\rho_i,f_E(\rho_i)\}_{i=1}^N

Training Data

Unknown Function

f_E : \mathcal{D}(\mathcal{H}) \to \mathbb{R}

Learning Unknown Measurement

Hypothesis Set

\{f_{\rho}:\rho\in \mathcal{D}(\mathcal{H})\}
\{(E_i,f_\rho(E_i)\}_{i=1}^N

Training Data

Unknown Function

f_\rho : \mathcal{E}(\mathcal{H}) \to \mathbb{R}

Learning States

Learning Measurements

Hypothesis Set

\{f_{E}:E\in \mathcal{E}(\mathcal{H})\}
\{(\rho_i,f_E(\rho_i)\}_{i=1}^N

Training Data

Unknown Function

f_E : \mathcal{D}(\mathcal{H}) \to \mathbb{R}

Hypothesis Set

\{f_{\rho}:\rho\in \mathcal{D}(\mathcal{H})\}
\{(E_i,f_\rho(E_i)\}_{i=1}^N

Training Data

Unknown Function

f_\rho : \mathcal{E}(\mathcal{H}) \to \mathbb{R}

fat\(_{\mathcal{D}(\mathcal{H})}(\epsilon,\mathcal{E}(\mathcal{H})) = O(\log d/\epsilon^2)\)

Sample Complexity for Learning Quantum States

What is the sample complexity of learning unknown measurements?

Learning States

Learning Measurements

fat\(_{\mathcal{D}(\mathcal{H})}(\epsilon,\mathcal{E}(\mathcal{H})) = O(\log d/\epsilon^2)\)

fat\(_{\mathcal{E}(\mathcal{H})}(\epsilon,\mathcal{D}(\mathcal{H})) = O( d/\epsilon^2)\)

Technical Merits

  • The two problems can be solved in the same way.

  • You don't need to know quantum mechanics.

\(S_1^d\) and \(S_\infty^d\) are polar to each other.

S_1^d=\text{conv}(-\mathcal{D}(\mathbb{C}^d)\cup \mathcal{D}(\mathbb{C}^d))
S_\infty^d=\text{conv}(-\mathcal{E}(\mathbb{C}^d)\cup \mathcal{E}(\mathbb{C}^d))

State - Measurement Duality

 \(\mathcal{S}=\{x_1,\ldots,x_n\}\subset B_X\) is \(\epsilon\)-shattered by \(B_{X^*}\)  if, for \(a_1,\ldots,a_n\in\mathbb{R}\),
$$\epsilon\sum_{i=1}^n|a_i|\leq \left\|\sum_{i=1}^n a_i x_i\right\|_\mathcal{X},$$

Choose \(\{a_i\}\) to be independent and uniform \(\{+1,-1\}\) RVs.

LHS = \(\epsilon n\)

[1]  Mendelson and  Schechtman, The Shattering Dimension of Sets of Linear Functionals, The Annals of   Probability, 32 (3A): 1746–1770, 2004

Find \(C(n,d)\) that upper bounds \(\mathbb{E}\left\|\sum_{i=1}^n a_i x_i\right\|_{\mathcal{X}}\)

\(\epsilon n \leq C(n,d)\)

 \(\mathcal{S}=\{x_1,\ldots,x_n\}\subset B_X\) is \(\epsilon\)-shattered by \(B_{X^*}\)  if, for \(a_1,\ldots,a_n\in\mathbb{R}\),
$$\epsilon\sum_{i=1}^n|a_i|\leq \left\|\sum_{i=1}^n a_i x_i\right\|_\mathcal{X},$$

Learning Q. States 

\(B_X:=S_\infty^d\) and \(B_{X^*}:= S_1^d\)

\(\mathbb{E}\left\|\sum_{i=1}^n a_i x_i\right\|_{\infty}\)

\(\leq \sqrt{2\sigma^2 \log d}\)

Joel A. Tropp, Foundations of Computational Mathematics, 12 (4): 389–434, 2011.

\(\sigma^2=\left\|\mathbb{E}\left(\sum_{i=1}^n a_i x_i\right)^2\right\|_\infty\leq n\)

\(\epsilon n \leq \sqrt{2n\log d}\)

Learning Measurement 

\(B_X:=S_1^d\) and \(B_{X^*}:= S_\infty^d\)

\(\mathbb{E}\left\|\sum_{i=1}^n a_i x_i\right\|_{1}\)

\(\leq \sqrt{n d}\)

[Noncommutative Khintchine inequalities]

\(\epsilon n \leq \sqrt{nd}\)

Final Remark

\(B_X:=S_p^d\) and \(B_{X^*}:= S_q^d\)

\frac{1}{p}+\frac{1}{q}=1

Open Questions

Sample Complexity for learning Quantum Maps??

Thank you for your attention!