Quantum algorithms for machine learning

Soutenance de thése de

Alessandro Luongo

20 Novembre 2020

Supervisor: Iordanis Kerenidis

Co-supervisor: Frédéric Magniez.

aluongo@irif.fr

This thesis addresses this question:

Is machine learning a promising domain for quantum algorithms*?

* to be run on fault-tolerant quantum computers with quantum access to classical data

Runtime

$ O\left(nd^2 \right) $

$ O\left(\|X\|_0 \textcolor{red}{\times} \text{poly}( \kappa(X), \epsilon, ...) \right) $

$ O\left(\|X\|_0 \textcolor{red}{+} \text{poly}(\kappa(X), \epsilon, \mu(X), ...) \right) $

Worst-case classical algorithms

Randomized classical algorithms

Input $ X \in \mathbb{R}^{n \times d} \text{ with } n \gg d$

Quantum algorithms

This thesis addresses this question:

Is machine learning a promising domain for quantum algorithms*?

Runtime

$ O\left(nd^2 \right) $

Worst-case classical algorithms

Randomized classical algorithms

Quantum algorithms

Input $ X \in \mathbb{R}^{n \times d} \text{ with } n \gg d$

$ O\left(\|X\|_0 \right) \textcolor{red}{+} O\left( \text{poly}(\kappa(X), \epsilon, \mu(X), ...) \right) $

* to be run on fault-tolerant quantum computers with quantum access to classical data

$ O\left(\|X\|_0 \textcolor{red}{\times} \text{poly}( \kappa(X), \epsilon, ...) \right) $

The end of Moore's law?

Size of benchmark datasets over time

GB range: 52.8% (2013), 54.3% (2014),

55.6% (2015) 56.0% (2018)

What was the largest dataset you analyzed?

https://www.kdnuggets.com/2018/10/poll-results-largest-dataset-analyzed.html

Quantum!

Quantum classification of the MNIST dataset via slow feature analysis. I. Kerenidis, AL - PRA. [QSFA] (supervised ML, dimensionality reduction, classification, experiments on real data)
q-means: A quantum algorithm for unsupervised machine learning.

I. Kerenidis, J. Landman, AL, A. Prakash - NeurIPS2019 [QMEANS] (unsupervised ML, clustering, experiments on real data)
Quantum Expectation-Maximization for Gaussian mixture models.

I. Kerenidis, AL, A. Prakash - ICML2020 [QEM] (unsupervised ML, clustering, experiments on real data)
Quantum algorithms for spectral sums. C. Shao, AL - arXiv:2011.06475 [QSS] (quantum algorithms numerical linear algebra)
Application of quantum algorithms for spectral sums. AL - (to appear) [AQSS] (statistics, applications.)

Contributions

Quantum classification of the MNIST dataset via slow feature analysis. I. Kerenidis, AL - PRA. [QSFA] (supervised ML, dimensionality reduction, classification, experiments on real data)
q-means: A quantum algorithm for unsupervised machine learning.

I. Kerenidis, J. Landman, AL, A. Prakash - NeurIPS2019 [QMEANS] (unsupervised ML, clustering, experiments on real data)
Quantum expectation-maximization for Gaussian mixture models.

I. Kerenidis, AL, A. Prakash - ICML2020 [QEM] (unsupervised ML, clustering, experiments on real data)
Quantum algorithms for spectral sums. C. Shao, AL - arXiv:2011.06475 [QSS] (quantum algorithms numerical linear algebra)
Application of quantum algorithms for spectral sums. AL - (to appear) [AQSS] (statistics, applications.)
Quantum algorithms for data representation. A. Bellante, AL - (to appear) [QADR] (natural language processing, experiments on real data, eigenvalue problems)

Contributions

The QML toolkit

Query access to matrices
Quantum linear algebra
Distance estimations
Singular Value Estimation
Tomography (of pure states)
Hamiltonian simulation
Amplitude estimation and amplification
Singular value transformations
Polynomial approximations
...

Quantum access to a matrix

Classical preprocessing time: $ O(\textcolor{red}{nd} \log nd) $
Classical space: $ O\left( \textcolor{red}{nd} \log nd \right) $
Query time: $ O(\textcolor{red}{\log nd}) $
Quantum space: $ O \left(\textcolor{red}{ \log nd }\right) $

Iordanis Kerenidis, Anupam Prakash - 8th Innovations in Theoretical Computer Science Conference - ITCS 2017.

Anupam Prakash - Quantum algorithms for linear algebra and machine learning. Diss. UC Berkeley - 2014.

Quantum query:

$ |i\rangle |0 \rangle \mapsto |i\rangle |x_i\rangle $

Where $|x_i\rangle=\frac{1}{\|x_i\|_2} x_i = \frac{1}{\|x_i\|_2} \sum_i (x_i)_j |j\rangle $

$ X \in \mathbb{R}^{\textcolor{red}{n \times d}} = [x_1, \dots, x_n]^T $ $ x_i \in \mathbb{R}^d$

Classical preprocessing time: $ O(\textcolor{red}{nd} \log nd) $
Classical space: $ O\left( \textcolor{red}{nd} \log nd \right) $
Query time: $ O(\textcolor{red}{\log nd}) $
Quantum space: $ O \left(\textcolor{red}{ \log nd }\right) $

Iordanis Kerenidis, Anupam Prakash - 8th Innovations in Theoretical Computer Science Conference - ITCS 2017.

Anupam Prakash - Quantum algorithms for linear algebra and machine learning. Diss. UC Berkeley - 2014.

Where $|x_i\rangle=\frac{1}{\|x_i\|_2} x_i = \frac{1}{\|x_i\|_2} \sum_i (x_i)_j |j\rangle $

Quantum query:

$ \frac{1}{\sqrt{n}}\sum_i |i\rangle |0 \rangle \mapsto \frac{1}{\sqrt{n}}\sum_i |i\rangle |x_i\rangle $

$ X \in \mathbb{R}^{\textcolor{red}{n \times d}} = [x_1, \dots, x_n]^T $ $ x_i \in \mathbb{R}^d$

Quantum access to a matrix

Quantum linear systems of equations

Given:

quantum sparse access $A \in \R^{n \times n}$,
and a vector $x \in \R^n$

The HHL algorithm produces a state $|z\rangle$ such that

\| |z\rangle - |\textcolor{red}{A^{-1}x}\rangle \| \leq \epsilon

O(\frac{\kappa^2(A)s^2(A)}{\epsilon}\log(n))

$ \kappa(A) = \frac{\sigma_1(A)}{\sigma_n(A)} $ and $ s = \text{row's sparsity} $

Harrow, Aram, Avinatan Hassidim, Seth Lloyd - Physical review letters - 2009

Quantum linear systems of equations

The HHL algorithm produces a state $|z\rangle$ such that

\| |z\rangle - |\textcolor{red}{A^{-1}x}\rangle \| \leq \epsilon

$ \kappa(A) = \frac{\sigma_1(A)}{\sigma_n(A)} $

Harrow, Aram, Avinatan Hassidim, Seth Lloyd - Physical review letters - 2009

\widetilde{O}(\|A\|_0) +

O(\frac{\kappa^2(A)\mu^2(A)}{\epsilon}\log(n))

Given:

quantum access $A \in \R^{n \times n}$,
and a vector $x \in \R^n$

Quantum linear systems of equations

The HHL algorithm produces a state $|z\rangle$ such that

\| |z\rangle - |\textcolor{red}{A^{-1}x}\rangle \| \leq \epsilon

$ \kappa(A) = \frac{\sigma_1(A)}{\sigma_n(A)} $

Harrow, Aram, Avinatan Hassidim, Seth Lloyd - Physical review letters - 2009

\widetilde{O}(\frac{\kappa^2(A)\mu^2(A)}{\epsilon})

Given:

quantum access $A \in \R^{n \times n}$,
and a vector $x \in \R^n$

Given quantum access to a (sparse) $A \in \R^{n \times n}$, and a vector $x \in \R^n$

The HHL algorithm produces a state $|z\rangle$ such that

\| |z\rangle - |\textcolor{red}{A^{-1}x}\rangle \| \leq \epsilon

\widetilde{O}(\frac{\kappa^2(A)s^2(A)}{\epsilon})

Harrow, Aram, Avinatan Hassidim, Seth Lloyd - Physical review letters - 2009

Quantum linear systems of equations

$ \kappa(A) = \frac{\sigma_1(A)}{\sigma_n(A)} $

$ s = \text{row's sparsity} $

Quantum singular value transformations

Given matrix $A \in \R^{n \times m}$, and a vector $x \in \R^m$

It is possible to produce a state $|z\rangle$ s.t.

\widetilde{O}(\kappa(A)\mu(A){\log (1/\epsilon)})

\| |z\rangle - |\textcolor{red}{Ax}\rangle \| \leq \epsilon

\| |z\rangle - |\textcolor{red}{A^{-1}x}\rangle \| \leq \epsilon

\widetilde{O}(\kappa(A)\mu(A){\log (1/\epsilon)})

In general:

\| |z\rangle - |\textcolor{red}{f(A)x}\rangle \| \leq \epsilon

\widetilde{O}(\text{poly}(\kappa(A), \mu(A)){\log (1/\epsilon)})

$f(A) = \sum_i^d f(\sigma_i)|u_i\rangle \langle v_i| $

Iordanis Kerenidis, Anupam Prakash - 8th Innovations in Theoretical Computer Science Conference - 2017.

András Gilyén, et al. - Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing - 2019.

Guang Hao Low, Isaac L. Chuang - Physical review letters - 2017.

$\mu(A) = \min\left(\|A\|_F, \sqrt{\max_{i \in [n]} \|a_i\|_{2p}^{2p} \max_{i \in [d]} \|a_{*i}\|_{2(1-p)}^{(1-p)} }\right) $

\| |z\rangle - |\textcolor{red}{A_{k}^{\dagger}A_{k}x}\rangle \| \leq \epsilon

\widetilde{O}( \frac{ \kappa(A)\mu(A)}{\delta\epsilon} )

Distance estimation subroutines

Euclidean distance [QMEANS]
Quadratic forms [Thesis]
Distance induced by $A$ [QEM]

$|i,j\rangle \mapsto |i,j ,\|x_i- x_j\|_2\rangle $

$|i,j\rangle \mapsto |i,j ,d_A(x_i, x_j) \rangle $

$|i,j\rangle \mapsto |i,j ,x_i^TA^\textcolor{orange}{-1}x_j \rangle $

$ \widetilde{O}(\frac{1 }{\epsilon} ) $

$ \widetilde{O}(\frac{\mu(A)}{\epsilon} ) $

$ \widetilde{O}(\frac{\mu(A)\textcolor{orange}{\kappa(A)}}{\epsilon} ) $

$ X \in \mathbb{R}^{n \times d} = [x_1, \dots, x_n]^T $ $ x_i \in \mathbb{R}^d$

\| \overline{x} - |x\rangle \|_{\textcolor{red}{2}} \leq \epsilon

We can produce an estimate $\overline{x}$ of $|x\rangle$ such that $\|\overline{x}\| = 1$ and

\| \overline{x} - |x\rangle \|_{\textcolor{red}{\infty}} \leq \epsilon

using

O\left(\frac{\log d}{\epsilon^2}\right)

Tomography of pure states

Iordanis Kerenidis, Anupam Prakash - ACM Transactions on Quantum Computing, 2020.

Iordanis Kerenidis, Anupam Prakash, Jonas Landman - International Conference on Learning Representations, 2019.

samples

using

O\left(\frac{d\log d}{\epsilon^2}\right)

samples

SVE of product of matrices

Theorem: Assume to have quantum access to $A, B \in \mathbb{R}^{n \times n} $.

There is an algorithm that performs the mapping

$ \sum_i \alpha_i |i\rangle \mapsto \sum_i \alpha_i|i, \sigma_i \rangle $

where $\sigma_i $ is the i-th singular value of $AB$ in time

$O\left(\frac{(\kappa(A)+\kappa(B))( \mu(A)+\mu(B))}{\epsilon}\right) $

Shantanav Chakraborty., et al. - 46th International Colloquium on Automata, Languages, and Programming - 2019.

[QSFA]

Supervised

Learning

$X \in \mathbb{R}^{n \times d} $ dataset
$ L \in [K]^{n} $ labels

(classification)

Quantum supervised learning

Quantum slow feature analysis for dimensionality reduction
Quantum Frobenius distance classifier: a simple NISQ classifier
Simulation on MNIST dataset of handwritten digits
Extension to other generalized eigenvalue problems in ML

[QSFA, Thesis]

https://cmp.felk.cvut.cz/cmp/software/stprtool

Slow Feature Analysis

$X \in \mathbb{R}^{n \times \textcolor{orange}{d}} $ (images)

$Y \in \mathbb{R}^{n \times \textcolor{orange}{K}} $

$ y_i =\left[ w_1^Tx_i, \dots, w_K^Tx_i\right] $

$ \langle Y_{*j}\rangle =0 $

$ \langle Y^2_{ij}\rangle = 1 $

$ \forall j' < j : \langle Y_{*j'} Y_{*j} \rangle = 0 $

$L \in [K]^{n} $ (labels)

Finding the model $ \{w_j\}_{j=1}^K $ reduces to a constrained optimization problem:

The componentwise average should be zero.
The componentwise variance should be $1$.
Signals are maximally uncorrelated.

Constraints

Slow Feature Analysis

$ \Delta_j = \frac{1}{a} \sum_{j=1}^K \sum (w_j^Tx^{(j)}_s - w_j^Tx^{(j)}_t)^2 $

$X \in \mathbb{R}^{n \times \textcolor{orange}{d}} $ (images)

$Y \in \mathbb{R}^{n \times \textcolor{orange}{K}} $

$ y_i =\left[ w_1^Tx_i, \dots, w_K^Tx_i\right] $

$L \in [K]^{n} $ (labels)

Finding the model $ \{w_j\}_{j=1}^K $ reduces to an optimization problem:

$\substack{s,t \in T_k \\ i<j}$

Quantum Slow Feature Analysis

Theorem:

Quantum access to $X$, derivative matrix $\dot{X}$
Let $ \epsilon, \theta, \delta, \eta >0$

There are quantum algorithms to get:

Map dataset in the slow feature space $ |\overline{Y}\rangle $ in time: $$ \tilde{O}\left( \frac{ ( \kappa(X) + \kappa(\dot{X})) ( \mu({X})+ \mu(\dot{X}) ) }{\delta\theta} \gamma_{K-1} \right) $$
Find $ \{ w_j\}_{j=0}^{K}$ in time: $$O\left(d^{1.5}\sqrt{K}\frac{\kappa(X)\kappa(\dot X X)(\mu(X) + \mu(\dot{X}))}{\epsilon^2}\right)$$

[QSFA]

QFDC: classification in slow feature space

Theorem: Assume to have quantum access to $|Y\rangle$ in time T, we can label images in $k$ classes in time $O(\frac{kT}{\epsilon}) $.

Definition: QFDC (Quantum Frobenius distance classifier):

A point is assigned to the cluster with smallest

normalized--average-squared distance

between the point and the points of the cluster.

[QSFA]

We get high accuracy with a fast quantum classifier

Classification of handwritten digits of MNIST dataset

Ex: polynomial expansion

of degree 2:

$[x_1, x_2, x_3] \mapsto [x_1^2, x_1x_2 \dots x_3^3 ]$

We get high accuracy with a fast quantum classifier

Malware detection via DGA classification

Accuracy	Original classifier	With slow-feature
Logistic Regression	89%	90.5% (+1.5%)
Naive Bayes classifier	89.3%	92.3% (+3%)
Decision Trees	91.4%	94.0% (+2.6%)

[Thesis]

Using SFA in a classification problem improves its accuracy
QSFA can process old datasets in new ways!

SFA as instance of a more general problem

The GEP (Genrealized Eigenvalue Problem) is defined as:

$B=X^TX$
$A=\dot X^T \dot X $

In SFA:

ICA Independent Component Analysis
G-IBM Gaussian Information Bottleneck Method
CCA Canonical Correlation Analysis
SC (some) Spectral Clustering [1]
PLS Partial Least Squares
LE Laplacian Eigenmaps
FLD Fisheral Linear Discriminant
SFA Slow Feature Analysis
KPCA Kernel Principal Component Analysis

$ AW = BW\Lambda $

[1] Iordanis Kerenidis, Jonas Landman - Quantum spectral clustering. arXiv2007.00280. (2020)

Quantum supervised learning

Quantum slow feature analysis for dimensionality reduction
Simulation on MNIST dataset of handwritten digits
Quantum Frobenius distance classifier: a simple NISQ classifier
Extension to other generalized eigenvalue problems in ML

[QSFA, Thesis]

Unsupervised

Learning

$X \in \mathbb{R}^{n \times d} $

(clustering)

Quantum unsupervised

learning

q-means for clustering (quantum version of k-means)
Quantum Expectation-Maximization
Simulation on VoxForge dataset for speaker recognition

The k-means algorithm:

$ t \leftarrow 0 $

Step 1:

Compute distance for all points $x_i $ and centroid $\mu_j^{t}, $ $$ d(x_i, \mu_j^{t}) $$
Assign points to closest cluster: $$ l(x_i) = \argmin_{c \in [K]}{ d(x_i, \mu_j^{t})}$$

Step 2:

Compute the barycenter: $$\mu_j^{t+1} = \frac{1}{|C_j|}\sum_{i \in C_j}^{} x_i $$

t $\leftarrow $t+1

$O(tkdn) $

q-means

$ t \leftarrow 0 $

Step 1:

Compute distance for all points $v_i $ and centroid $\mu_j^{t}, $ $$|i,j\rangle \mapsto |i,j,d(v_i, \mu_j^{t})\rangle $$
Generate characteristic vector of a cluster: $$ |\chi_j \rangle = \frac{1}{|C_j|}\sum_{i \in C_j} |i \rangle$$

Step 2:

Use quantum linear algebra to build $$|\mu_j^{t+1}\rangle = \frac{1}{|C_j|}\sum_{i \in C_j}^{} |v_i\rangle $$
Perform tomograph on $ |\mu_j^{t+1}\rangle $
Build quantum access to $ \mu_j$.

$ t \leftarrow t+1 $

q-means

Theorem: Given quantum access to a matrix $X \in \mathbb{R}^{n \times d} $, there is quantum algorithm that fits a k-means model in time:

$ \widetilde{O}\left( k^2 d \frac{\eta^{2.5}}{\delta^3} \right) $

Classical: $ O\left(\textcolor{red}{n}kd\right) $

[QMEANS]

$ \| \mu_j - \mu_j^* \| \leq \delta $

$ \eta = \max_i (\|x_i\|^2 ) $

k-means learns the cluster's barycenters

$ \{\mu_j\}_{j=1}^K = \text{argmin} \sum_i^n d(x_i, \mu_{l(x_i)} $)

For a dataset $ \{ x_i \}_{i}^n $ finds centroids $\{ \mu_j\}_{j=1}^K$ such that:

Text

i.e. minimize distance between points and their cluster's barycenter.

q-means

$ t \leftarrow 0 $

Expectation:

Compute distance for all points $v_i $ and centroid $\mu_j^{t}, $ $$|i,j\rangle \mapsto |i,j,d(v_i, \mu_j^{t})\rangle $$
Generate characteristic vector of a cluster: $$ |\chi_j \rangle = \frac{1}{|C_j|}\sum_{i \in C_j} |i \rangle$$

Maximization:

Use quantum linear algebra to build $$|\mu_j^{t+1}\rangle = \frac{1}{|C_j|}\sum_{i \in C_j}^{} |v_i\rangle $$
Perform tomograph on $ |\mu_j^{t+1}\rangle $
Build quantum access to $ \mu_j$.

$ t \leftarrow t+1 $

Gaussian mixture models

$ k $ labels

Multinomial distribution

$ [\theta, \mu_1, \dots, \mu_k, \Sigma_1, \dots, \Sigma_k ] $

$ \gamma^* = \text{argmax} \prod_i \sum_{j \in [k]} \theta_j p(x_i|\mu_j,\Sigma_j) $

Gaussian distribution

$_\gamma$

$\|\theta-\overline{\theta}\|<\delta_\theta $

$\|\mu_j - \overline{\mu_j}\| < \delta_\mu $

$\|\Sigma_j - \overline{\Sigma_j}\| < \delta_\mu\sqrt{\eta}$

Error introduced by quantum algorithm parameters

Maximum Likelihood Estimation

Expectation-Maximization

Repeat

$t = 0$

Expectation \[ r_{ij}^t \leftarrow \frac{\theta^t_j N(v_i; \mu^t_j, \Sigma^t_j )}{\sum_{l=1}^k \theta^t_l N(v_i; \mu^t_l, \Sigma^t_l)}\]

$ t \leftarrow t+1 $

Until $ | \ell(\gamma^{t-1};V) - \ell(\gamma^t;V) | < \tau $

Maximization

Update the parameters $\theta, \mu, \Sigma $ using the responsibilities $r_{ij} $

Quantum Expectation-Maximization

Repeat

$t = 0$

$ t \leftarrow t+1 $

Until $ | \ell(\gamma^{t-1};V) - \ell(\gamma^t;V) | < \tau $

Maximization

Use $U_R $ to generate states proportional to $\theta^{t+1}, \mu^{t+1}, \Sigma^{t+1} $
Perform tomography and create quantum access.

Create mapping \[ U_R |i,j\rangle|0\rangle \mapsto |i,j\rangle |r_{ij} ^t\rangle \]

Expectation

Quantum Expectation-Maximization

Repeat

Expectation \[ |r_{ij}^t\rangle \leftarrow \frac{\theta^t_j N(v_i; \mu^t_j, \Sigma^t_j )}{\sum_{l=1}^k \theta^t_l N(v_i; \mu^t_l, \Sigma^t_l)}\]
Maximization

$ |\theta_j^{t+1}\rangle \leftarrow \frac{1}{n}\sum_{i=1}^n r^{t}_{ij} $

$ |\mu_j^{t+1}\rangle \leftarrow \frac{\sum_{i=1}^n r^{t}_{ij} v_i }{ \sum_{i=1}^n r^{t}_{ij}}$

$ |\Sigma_j^{t+1}\rangle \leftarrow \frac{\sum_{i=1}^n r^{t}_{ij} (v_i - \mu_j^{t+1})(v_i - \mu_j^{t+1})^T }{ \sum_{i=1}^n r^{t}_{ij}} $

$ t \leftarrow t+1$

Until $ | \overline{\ell(\gamma^{t-1};V)} - \overline{\ell(\gamma^t;V)} | < \tau $

Quantum Expectation-Maximization

Repeat

Expectation \[ r_{ij}^t \leftarrow \frac{\theta^t_j N(v_i; \mu^t_j, \Sigma^t_j )}{\sum_{l=1}^k \theta^t_l N(v_i; \mu^t_l, \Sigma^t_l)}\]
Maximization

Update the parameters $\theta, \mu, \Sigma $ using the responsibilities $r_{ij} $

$ t \leftarrow t+1 $

Until $ | \ell(\gamma^{t-1};V) - \ell(\gamma^t;V) | < \tau $

Quantum Expectation-Maximization for

Gaussian mixture models

Theorem: Given quantum access to a matrix $ X \in \mathbb{R}^{n \times d} $ there is a quantum EM algorithm that fits a GMM in time:

[QEM]

Classical: $ O(d^2 k n) $

$ O\left( d^2k^{4.5} \gamma(X)\textcolor{red}{\log n}\right) $ $\gamma(n)= O\left( \frac{\eta^3 \kappa(X)\kappa^2(\Sigma)\mu(\Sigma)\mu(X)}{\delta^3} \right) $

We get high accuracy with a fast quantum classifier

Classical ML accuracy: 169/170
Quantum ML accuracy: 167/170
Max element of $ \Sigma_j^{-1}$ set to $5$ via $\kappa = \frac{1}{\lambda_\tau} $

Speaker recognition problem on VoxForge dataset

Quantum unsupervised

learning

q-means for clustering (quantum version of k-means)
Quantum Expectation-Maximization
Simulation on VoxForge dataset for speaker recognition

Quantum Spectral Sums

$S_f(A) = \sum_i^n f(\lambda_i) $

$S_f(A) = \sum_i^n f(\sigma_i) $

$A \in \mathbb{R}^{n \times n} $ SPD

$ f : \mathbb{R} \mapsto \mathbb{R} $

Theorem: Quantum access to a SPD matrix $ A $,

$\|A\| < 1 $ and $\epsilon \in(0,1)$.

There is a quantum algorithm that estimate $ \log\det(A)$ with relative error $\epsilon $ w.h.p. in time $ \widetilde{O}({\mu(A) \kappa(A)}/{\epsilon})$.

Quantum algorithms for log-determinant

$ S_{\log(x)}(A) =\log\det(A) = \sum_i^n \log(\lambda_i) $

Application: Tyler's M-estimator.

[QSS]

Tyler's M-estimator

$\Gamma_* \leftarrow \frac{1}{n} \sum_{i=1}^n \frac{x_ix_i^T}{x_i^T\Gamma_*^{-1}x_i} $

Data from sub-Gaussian distributions.
Robust to outliers
Valid for data $ X^{n \times d}$ with $ n,d \mapsto \infty $

[Thesis, AQSS]

In many cases $C = X^TX$ is not a "good" sample covariance matrix

Tyler's M-estimator

[Thesis, AQSS]

Goes, J, et al. - The Annals of Statistics 2020.

Might benefits from componentwise thresholding:

Runtime:

\[ \tilde O\left(\textcolor{red}{d^2}\frac{\mu(X)\kappa(\Sigma_k)\mu(\Sigma_k)}{\epsilon^3}\gamma\right) \]

$ \Gamma_{k+1} = \sum_{i=1}^n \frac{x_ix_i^T}{ x_i^T \Gamma_k^{-1}x_i}/ Tr[\sum_{i=1}^n \frac{x_ix_i^T}{x_i^T \Gamma_k^{-1}x_i}]$

Stopping condition: log-likelihood with a log-determinant

Classical:

$ O\left( d^2n \right) $

Other spectral sums and applications (not in thesis)

Schatten p-norm $O(2^{p/2}\mu(A)(p+\kappa(A))\sqrt{n}/\epsilon) $
Von Neumann entropy $O(\mu(A)\kappa(A)n /\epsilon )$
Trace of Inverse $O(\mu^2 \kappa(A)^2/\epsilon) $

Applications..

Counting number of spanning trees
Counting triangles
Estimating effective resistance
Training Gaussian processes..
...

[QSS]

Thanks

Elham Kashefi
Iordanis Kerenidis
Frédéric Magniez
Filippo Miatto
Simon Perdrix
Simone Severini

Conclusions and outlook

We have a corpus of algorithms with provable speedups.
- Simple to extend current algorithms to more powerful models.
Quantum algorithms seems to work promisingly well in ML:
- $\kappa(A) $, $\mu(A), s, \eta, \epsilon $,
QML might allow solving new or existing problems:
- better, faster, cheaper, or a combination.

In a glorious future, with fault-tolerant quantum computers and quantum access to data:

Artificial Intelligence might be promising to explore
- Smaller QRAM?
We should work directly on state-of-the-art ML algorithm:
- Interpretable, explainable, fair, robust, privacy-preserving ML.

Thanks for your time, there is never enough.

Runtimes of CA and LSA

Quantum correspondence analysis

$ \widetilde{O}\left( \frac{1}{\epsilon\gamma^2} + \frac{k\textcolor{red}{(n+m)}}{\theta\epsilon\delta^2}\right) $

Quantum latent semantic analysis:

$ \widetilde{O}\left(\left( \textcolor{blue}{\frac{1}{\epsilon\gamma^2}} + \frac{k\textcolor{red}{(n+m)}}{\theta\epsilon\delta^2}\right)\mu(A) \right) $

Armando Bellante - Master's thesis

Presented at Quantum Natuarl Language Processing Conference 2020

Correspondence Analysis (CA)

Q = diag(p_X)^{-1/2}(P_{X,Y} - p_Xp_Y^T)diag(p_Y)^{-1/2}

Orthogonal factors:

$F_X = diag(p_X)^{-1/2}U^{(k)}$
$F_Y = diag(p_Y)^{-1/2}V^{(k)}$

Factor scores:

$ \lambda_i =\sigma_i^2 $

Factor score ratios:

$ \lambda^{(i)}= \frac{\lambda_i}{\sum_j^r\lambda_j}$

Q = U\Sigma V^T

Consider two categorical random variables X, Y, and let $C$ be matrix of occurrences.

= \frac{\sigma_i^2}{\sum_j^r\sigma_j^2}

$ \hat{P}_{X,Y} = \frac{C}{\sum_{i=1}^{|X|} \sum_{j=1}^{|Y|} c_{ij}} = \frac{1}{n}C $

$\hat{p}_{X} = \hat{P}_{X,Y}1_{|Y|} $ and $\hat{p}_{Y} = 1_{|X|}^T\hat{P}_{X,Y} $

Latent Semantic Analysis (LSA)

\begin{bmatrix} \cdot & \cdot & \cdot\\ \cdot & \cdot & \cdot\\ \cdot & \cdot & \cdot\\ \end{bmatrix}

\begin{matrix} w_1\\ \cdot\\ w_n\\ \end{matrix}

\begin{matrix} d_1 & \cdot & d_m \\ \end{matrix}

A =

= U \Sigma V^T

Comparing words:

$AA^T = U\Sigma^2U$

$L = U^{(k)}\Sigma^{(k)}$

Comparing docs:

$A^TA=V\Sigma^2V$

$R = V^{(k)}\Sigma^{(k)}$

Comparing W & D:

$A=U\Sigma V$

$L' = U^{(k)}\Sigma^{(k)1/2}$

$R' = V^{(k)}\Sigma^{(k)1/2}$

Evolution of Mutual Information between layers while training a QNN

The dropout technique for avoiding barren plateaus

Rebecca Erbanni - Master's thesis

Alexander Singh - IRIF internship

Counting triangles

\[ \Delta(G) = \frac{1}{6} Tr[A^3] \]

Create block encoding of $B=A^{1.5}$
Estimate $Tr[B^TB] $

Van Apeldoorn, Joran, et al. - 58th Annual Symposium on Foundations of Computer Science - 2017.

$\widetilde{O} \Big( \frac{n^{1/2}s^2(A) \kappa (A) }{\sqrt{\Delta(G)}\epsilon} \Big) $

Hamoudi, Y, F. Magniez - 46th International Colloquium on Automata, Languages, and Programming - 2019.

$ \widetilde{O}\left( \left( \frac{n^{1/2}}{\Delta^{1/6}(G)} + \frac{m^{3/4}}{\sqrt{\Delta(G)}} \right) \cdot \text{poly}{(1/\epsilon)} \right) $

= $\widetilde{O} \Big( \frac{m^{1/2}s^{1.5}(A) \kappa (A) }{\sqrt{\Delta(G)}\epsilon} \Big) $

Canonical Correlation Analysis

Input: $ \{ (x_i, y_i) \}_{i=0}^n$ where $ x_i \in \mathbb{R}^{d_1}$ and $y_i \in \mathbb{R}^{d_2}$ i.e. matrices $X,Y$

Classical:

Step1: Solve GEP \[ \Sigma_{XY}\Sigma_{YY}^{-1}\Sigma_{YX}w_x = \lambda^2\Sigma_{XX}w_x \]
Step 2: Find $w_y $ by

$ w_y = \frac{\Sigma_{YY}^{-1}\Sigma_{YX}w_x}{\lambda} $

CCA model: find $w_x, w_y$ such that

$ {w_x, w_y} =\arg\max_{w_x, w_y} cos((Xw_x, Yw_y)) $

Quantum:

$ \Sigma_{XX}\Sigma_{XY}^{-1}\Sigma_{YY} = U\Sigma V^T $

$ W_x = \Sigma_{XX}^{-1/2}U $

$ W_y = \Sigma_{YY}^{-1/2}V $.

Quantum algorithms for model checking

A state-space exploration approach

Formalize our software as an automaton $A_P$.
For a temporal property $f$ we build the automaton $A_{\neg f}$.
Solve the emptiness problem of the language: \[L(A_P\times A_{\neg f}) = \emptyset\].

Software $\mapsto$ specification LTL

$\mapsto$ Büchi automata $\mapsto$ $\omega$-language

Idea: use quantum DFS!

Dürr, Christoph, et al. - SIAM Journal on Computing 35.6 (2006)

Theorem: The emptiness problem for $\omega$-languages is decidable!

\widetilde{O}\left( \sqrt{VE} \right)

Quantum algorithms for machine learning

This thesis addresses this question:

Is machine learning a promising domain for quantum algorithms*?

This thesis addresses this question:

Is machine learning a promising domain for quantum algorithms*?

The end of Moore's law?

Size of benchmark datasets over time

Contributions

Contributions

The QML toolkit

Quantum access to a matrix

Quantum access to a matrix

Quantum linear systems of equations

Quantum linear systems of equations

Quantum linear systems of equations

Quantum linear systems of equations

Quantum singular value transformations

Distance estimation subroutines

Tomography of pure states

SVE of product of matrices

Supervised

Learning

Quantum supervised learning

Slow Feature Analysis

Slow Feature Analysis

\( \Delta_j = \frac{1}{a} \sum_{j=1}^K \sum (w_j^Tx^{(j)}_s - w_j^Tx^{(j)}_t)^2 \)

Quantum Slow Feature Analysis

QFDC: classification in slow feature space

We get high accuracy with a fast quantum classifier

We get high accuracy with a fast quantum classifier

Quantum supervised learning

Unsupervised

Learning

Quantum unsupervised

learning

The k-means algorithm:

\(O(tkdn) \)

q-means

q-means

k-means learns the cluster's barycenters

q-means

Gaussian mixture models

Expectation-Maximization

Quantum Expectation-Maximization

Quantum Expectation-Maximization

Quantum Expectation-Maximization

Quantum Expectation-Maximization for

Gaussian mixture models

We get high accuracy with a fast quantum classifier

Quantum unsupervised

learning

Quantum Spectral Sums

Quantum algorithms for log-determinant

Tyler's M-estimator

Tyler's M-estimator

Other spectral sums and applications (not in thesis)

Thanks

Elham Kashefi Iordanis Kerenidis Frédéric Magniez Filippo Miatto Simon Perdrix Simone Severini

Conclusions and outlook

Runtimes of CA and LSA

Correspondence Analysis (CA)

Latent Semantic Analysis (LSA)

Counting triangles

Canonical Correlation Analysis

Quantum algorithms for model checking

Elham Kashefi
Iordanis Kerenidis
Frédéric Magniez
Filippo Miatto
Simon Perdrix
Simone Severini