Learning Quantum Objects

1st International Workshop on

Quantum Software and Quantum Machine Learning (QSML)

Min-Hsiu Hsieh

UTS: Centre for Quantum Software and Information

Title Text

This talk concerns

QIP 2018 Tutorial

Ronald de Wolf | CWI, University of Amsterdam
Title: Quantum Learning Theory

Complexity of Learning
Full Quantum Settings

Hao-Chung Cheng, MH, Ping-Cheng Yeh. The learnability of unknown quantum measurements. QIC 16(7&8):615–656 (2016).

f: X\to Y

f: X\to Y

Unknown Function

\{(x_i,y_i)\}_{i=1}^N

\{(x_i,y_i)\}_{i=1}^N

Training Data

\mathcal{H}

\mathcal{H}

Hypothesis Set

Learning

Algorithm

\hat{f}

\hat{f}

Comp. Complexity

Sample Complexity

f_\rho : \mathcal{E}(\mathcal{H}) \to \mathbb{R}

f_\rho : \mathcal{E}(\mathcal{H}) \to \mathbb{R}

f_\rho(E) = \text{Tr} E\rho

f_\rho(E) = \text{Tr} E\rho

Hypothesis Set

\{f_{\rho}:\rho\in \mathcal{D}(\mathcal{H})\}

\{f_{\rho}:\rho\in \mathcal{D}(\mathcal{H})\}

\{(E_i,f_\rho(E_i)\}_{i=1}^N

\{(E_i,f_\rho(E_i)\}_{i=1}^N

Training Data

Unknown Function

f_\rho : \mathcal{E}(\mathcal{H}) \to \mathbb{R}

f_\rho : \mathcal{E}(\mathcal{H}) \to \mathbb{R}

Photo Credit: Akram Youssry

\{(x_i,y_i)\}_{i=1}^N

\{(x_i,y_i)\}_{i=1}^N

Training Data

R_n(h) = \frac{1}{N}\sum_{i=1}^N \ell (h(x_i), y_i)

R_n(h) = \frac{1}{N}\sum_{i=1}^N \ell (h(x_i), y_i)

h\in\mathcal{H}

h\in\mathcal{H}

Hypothesis Set

f: X\to Y

f: X\to Y

Unknown Function

Given a loss function

\ell:Y\times Y \to \mathbb{R}

\ell:Y\times Y \to \mathbb{R}

find

{f}_n = \arg \min_{h\in \mathcal{H}} R_n (h)

{f}_n = \arg \min_{h\in \mathcal{H}} R_n (h)

where

Empirical Risk Minimization

Out-of-Sample Error

R(h) = \mathbb{E}_{X\sim\mu} [\ell (h(X), Y) ]

R(h) = \mathbb{E}_{X\sim\mu} [\ell (h(X), Y) ]

In-Sample Error

R_n(h) = \frac{1}{N}\sum_{i=1}^N \ell (h(x_i), y_i)

R_n(h) = \frac{1}{N}\sum_{i=1}^N \ell (h(x_i), y_i)

\{(x_i,y_i)\}_{i=1}^N

\{(x_i,y_i)\}_{i=1}^N

Training Data

h\in\mathcal{H}

h\in\mathcal{H}

Hypothesis Set

f: X\to Y

f: X\to Y

Unknown Function

|R(f_n) - R_n(f_n)| \leq Bound (n, \mathcal{H} )

|R(f_n) - R_n(f_n)| \leq Bound (n, \mathcal{H} )

\leq \sup_{h\in\mathcal{H}}|R(h) - R_n(h)|

\leq \sup_{h\in\mathcal{H}}|R(h) - R_n(h)|

if for any $\epsilon>0$

Probably Approximately Correct (PAC) Learnable

$\mathcal{H}$ is PAC learnable

$\lim_{n\to\infty}\sup_{\mu} \Pr\{\sup_{h\in\mathcal{H}}|R(h) - R_n(h)| >\epsilon\} = 0$

Sample Complexity

$\sup_{\mu} \Pr \left\{ \sup_{h\in\mathcal{H}} \big|R(h)-R_n(h)\big|\geq \epsilon \right\}\leq \delta$

Sample complexity $m_\mathcal{H}(\epsilon,\delta)$ is the first quantity such that

for every $n\geq m_\mathcal{H}(\epsilon,\delta),$

$m_{\mathcal{H}}(\epsilon,\delta)= \frac{C}{\epsilon^2}\left(\text{VCdim}(\mathcal{H})\log\left(\frac{2}{\epsilon}\right)+\log\left(\frac{2}{\delta}\right)\right)$

For Boolean functions $\mathcal{H}$

[1] Vapnik, Springer-Verlag, New York/Berlin, 1982.

[2] Blumer, Ehrenfeucht, Haussler, and Warmuth, Assoc. Comput. Machine, vol. 36, no. 4, pp. 151--160, 1989.

$\mathcal{X}=\mathbb{R}^2$ , $\mathcal{H}=\{f:\mathcal{X}\to\{0,1\},linear\}$

VC Dimension

For Real functions $\mathcal{H}$

fat $_\mathcal{H}(\epsilon,\mathcal{X})=\sup\{|\mathcal{S}|: \mathcal{S}$ is $\epsilon$ -shattered by $\mathcal{H}\}$

$\mathcal{H}$ $\epsilon$ -Shatters $\mathcal{S}=\{x_1,\cdots,x_n\}$ if

For Real functions $\mathcal{H}$

$m_{\mathcal{H}}= \frac{C}{\epsilon^2}\left(\text{fat}_{\mathcal{H}}(\frac{\epsilon}{8})\cdot\log(\frac2\epsilon)+\log(8/\delta)\right)$

[1] Bartlett, Long, and Williamson, J. Comput. System Sci., vol. 52, no. 3, pp. 434--452, 1996.

[2] Alon, Ben-David, Cesa-Bianchi, and Haussler, J. ACM, vol. 44, no. 4, pp. 616--631, 1997.

[3] Mendelson, Inventiones Mathematicae, vol. 152, pp. 37--55, 2003.

Sample Complexity for Learning Quantum Objects

Q. State

Measurement

f_\rho : \mathcal{E}(\mathcal{H}) \to \mathbb{R}

f_\rho : \mathcal{E}(\mathcal{H}) \to \mathbb{R}

f_\rho(E) = \text{Tr} E\rho

f_\rho(E) = \text{Tr} E\rho

Hypothesis Set

\{f_{\rho}:\rho\in \mathcal{D}(\mathcal{H})\}

\{f_{\rho}:\rho\in \mathcal{D}(\mathcal{H})\}

\{(E_i,f_\rho(E_i)\}_{i=1}^N

\{(E_i,f_\rho(E_i)\}_{i=1}^N

Training Data

Unknown Function

f_\rho : \mathcal{E}(\mathcal{H}) \to \mathbb{R}

f_\rho : \mathcal{E}(\mathcal{H}) \to \mathbb{R}

f_E: \mathcal{D}(\mathcal{H}) \to \mathbb{R}

f_E: \mathcal{D}(\mathcal{H}) \to \mathbb{R}

f_E(\rho) = \text{Tr} E\rho

f_E(\rho) = \text{Tr} E\rho

Hypothesis Set

\{f_{E}:E\in \mathcal{E}(\mathcal{H})\}

\{f_{E}:E\in \mathcal{E}(\mathcal{H})\}

\{(\rho_i,f_E(\rho_i)\}_{i=1}^N

\{(\rho_i,f_E(\rho_i)\}_{i=1}^N

Training Data

Unknown Function

f_E : \mathcal{D}(\mathcal{H}) \to \mathbb{R}

f_E : \mathcal{D}(\mathcal{H}) \to \mathbb{R}

Learning Unknown Measurement

Hypothesis Set

\{f_{\rho}:\rho\in \mathcal{D}(\mathcal{H})\}

\{f_{\rho}:\rho\in \mathcal{D}(\mathcal{H})\}

\{(E_i,f_\rho(E_i)\}_{i=1}^N

\{(E_i,f_\rho(E_i)\}_{i=1}^N

Training Data

Unknown Function

f_\rho : \mathcal{E}(\mathcal{H}) \to \mathbb{R}

f_\rho : \mathcal{E}(\mathcal{H}) \to \mathbb{R}

Learning States

Learning Measurements

Hypothesis Set

\{f_{E}:E\in \mathcal{E}(\mathcal{H})\}

\{f_{E}:E\in \mathcal{E}(\mathcal{H})\}

\{(\rho_i,f_E(\rho_i)\}_{i=1}^N

\{(\rho_i,f_E(\rho_i)\}_{i=1}^N

Training Data

Unknown Function

f_E : \mathcal{D}(\mathcal{H}) \to \mathbb{R}

f_E : \mathcal{D}(\mathcal{H}) \to \mathbb{R}

Hypothesis Set

\{f_{\rho}:\rho\in \mathcal{D}(\mathcal{H})\}

\{f_{\rho}:\rho\in \mathcal{D}(\mathcal{H})\}

\{(E_i,f_\rho(E_i)\}_{i=1}^N

\{(E_i,f_\rho(E_i)\}_{i=1}^N

Training Data

Unknown Function

f_\rho : \mathcal{E}(\mathcal{H}) \to \mathbb{R}

f_\rho : \mathcal{E}(\mathcal{H}) \to \mathbb{R}

fat $_{\mathcal{D}(\mathcal{H})}(\epsilon,\mathcal{E}(\mathcal{H})) = O(\log d/\epsilon^2)$

Sample Complexity for Learning Quantum States

What is the sample complexity of learning unknown measurements?

Learning States

Learning Measurements

fat $_{\mathcal{D}(\mathcal{H})}(\epsilon,\mathcal{E}(\mathcal{H})) = O(\log d/\epsilon^2)$

fat $_{\mathcal{E}(\mathcal{H})}(\epsilon,\mathcal{D}(\mathcal{H})) = O( d/\epsilon^2)$

Technical Merits

The two problems can be solved in the same way.

You don't need to know quantum mechanics.

$S_1^d$ and $S_\infty^d$ are polar to each other.

S_1^d=\text{conv}(-\mathcal{D}(\mathbb{C}^d)\cup \mathcal{D}(\mathbb{C}^d))

S_1^d=\text{conv}(-\mathcal{D}(\mathbb{C}^d)\cup \mathcal{D}(\mathbb{C}^d))

S_\infty^d=\text{conv}(-\mathcal{E}(\mathbb{C}^d)\cup \mathcal{E}(\mathbb{C}^d))

S_\infty^d=\text{conv}(-\mathcal{E}(\mathbb{C}^d)\cup \mathcal{E}(\mathbb{C}^d))

State - Measurement Duality

$\mathcal{S}=\{x_1,\ldots,x_n\}\subset B_X$ is $\epsilon$ -shattered by $B_{X^*}$ if, for $a_1,\ldots,a_n\in\mathbb{R}$ ,
$\epsilon\sum_{i=1}^n|a_i|\leq \left\|\sum_{i=1}^n a_i x_i\right\|_\mathcal{X},$

Choose $\{a_i\}$ to be independent and uniform $\{+1,-1\}$ RVs.

LHS = $\epsilon n$

[1] Mendelson and Schechtman, The Shattering Dimension of Sets of Linear Functionals, The Annals of Probability, 32 (3A): 1746–1770, 2004

Find $C(n,d)$ that upper bounds $\mathbb{E}\left\|\sum_{i=1}^n a_i x_i\right\|_{\mathcal{X}}$

$\epsilon n \leq C(n,d)$

$\mathcal{S}=\{x_1,\ldots,x_n\}\subset B_X$ is $\epsilon$ -shattered by $B_{X^*}$ if, for $a_1,\ldots,a_n\in\mathbb{R}$ ,
$\epsilon\sum_{i=1}^n|a_i|\leq \left\|\sum_{i=1}^n a_i x_i\right\|_\mathcal{X},$

Learning Q. States

$B_X:=S_\infty^d$ and $B_{X^*}:= S_1^d$

$\mathbb{E}\left\|\sum_{i=1}^n a_i x_i\right\|_{\infty}$

$\leq \sqrt{2\sigma^2 \log d}$

Joel A. Tropp, Foundations of Computational Mathematics, 12 (4): 389–434, 2011.

$\sigma^2=\left\|\mathbb{E}\left(\sum_{i=1}^n a_i x_i\right)^2\right\|_\infty\leq n$

$\epsilon n \leq \sqrt{2n\log d}$

Learning Measurement

$B_X:=S_1^d$ and $B_{X^*}:= S_\infty^d$

$\mathbb{E}\left\|\sum_{i=1}^n a_i x_i\right\|_{1}$

$\leq \sqrt{n d}$

[Noncommutative Khintchine inequalities]

$\epsilon n \leq \sqrt{nd}$

Final Remark

$B_X:=S_p^d$ and $B_{X^*}:= S_q^d$

\frac{1}{p}+\frac{1}{q}=1

\frac{1}{p}+\frac{1}{q}=1

Open Questions

Sample Complexity for learning Quantum Maps??

Thank you for your attention!