Quantum algorithms for machine learning

Soutenance de thése de

Alessandro Luongo

20 Novembre 2020

Supervisor: Iordanis Kerenidis

Co-supervisor: Frédéric Magniez.

aluongo@irif.fr

This thesis addresses this question:

Is machine learning a promising domain for quantum algorithms*?

* to be run on fault-tolerant quantum computers with quantum access to classical data

Runtime

\( O\left(nd^2 \right) \)

\( O\left(\|X\|_0 \textcolor{red}{\times} \text{poly}( \kappa(X), \epsilon, ...) \right) \)

\( O\left(\|X\|_0 \textcolor{red}{+} \text{poly}(\kappa(X), \epsilon, \mu(X), ...) \right) \)

Worst-case classical algorithms

Randomized classical algorithms

Input \( X \in \mathbb{R}^{n \times d} \text{ with } n \gg d\)

Quantum algorithms

This thesis addresses this question:

Is machine learning a promising domain for quantum algorithms*?

Runtime

\( O\left(nd^2 \right) \)

Worst-case classical algorithms

Randomized classical algorithms

Quantum algorithms

Input \( X \in \mathbb{R}^{n \times d} \text{ with } n \gg d\)

\( O\left(\|X\|_0 \right)  \textcolor{red}{+} O\left( \text{poly}(\kappa(X), \epsilon, \mu(X), ...) \right) \)

* to be run on fault-tolerant quantum computers with quantum access to classical data

\( O\left(\|X\|_0 \textcolor{red}{\times} \text{poly}( \kappa(X), \epsilon, ...) \right) \)

The end of Moore's law?

Size of benchmark datasets over time

GB range: 52.8% (2013),     54.3% (2014),    

           55.6% (2015)      56.0% (2018)

What was the largest dataset you analyzed?

https://www.kdnuggets.com/2018/10/poll-results-largest-dataset-analyzed.html

Quantum!

  • Quantum classification of the MNIST dataset via slow feature analysis. I. Kerenidis, AL - PRA. [QSFA] (supervised ML, dimensionality reduction, classification, experiments on real data)

  • q-means: A quantum algorithm for unsupervised machine learning.

    I. Kerenidis, J. Landman, AL, A. Prakash - NeurIPS2019 [QMEANS] (unsupervised ML, clustering, experiments on real data)

  • Quantum Expectation-Maximization for Gaussian mixture models.

    I. Kerenidis, AL, A. Prakash - ICML2020 [QEM] (unsupervised ML, clustering, experiments on real data)

  • Quantum algorithms for spectral sums.  C. Shao, AL - arXiv:2011.06475 [QSS] (quantum algorithms numerical linear algebra)

  • Application of quantum algorithms for spectral sums. AL - (to appear) [AQSS] (statistics, applications.)

Contributions

  • Quantum classification of the MNIST dataset via slow feature analysis. I. Kerenidis, AL - PRA. [QSFA] (supervised ML, dimensionality reduction, classification, experiments on real data)

  • q-means: A quantum algorithm for unsupervised machine learning.

    I. Kerenidis, J. Landman, AL, A. Prakash - NeurIPS2019 [QMEANS] (unsupervised ML, clustering, experiments on real data)

  • Quantum expectation-maximization for Gaussian mixture models.

    I. Kerenidis, AL, A. Prakash - ICML2020 [QEM] (unsupervised ML, clustering, experiments on real data)

  • Quantum algorithms for spectral sums.  C. Shao, AL - arXiv:2011.06475 [QSS] (quantum algorithms numerical linear algebra)

  • Application of quantum algorithms for spectral sums. AL - (to appear) [AQSS] (statistics, applications.)

  • Quantum algorithms for data representation. A. Bellante, AL - (to appear) [QADR] (natural language processing, experiments on real data, eigenvalue problems)

Contributions

The QML toolkit

  • Query access to matrices

  • Quantum linear algebra

  • Distance estimations

  • Singular Value Estimation

  • Tomography (of pure states)

  • Hamiltonian simulation

  • Amplitude estimation and amplification

  • Singular value transformations

  • Polynomial approximations

  • ...

Quantum access to a matrix

  • Classical preprocessing time: \( O(\textcolor{red}{nd} \log nd) \)

  • Classical space: \( O\left( \textcolor{red}{nd} \log nd \right) \)

  • Query time: \( O(\textcolor{red}{\log nd}) \)

  • Quantum space: \( O \left(\textcolor{red}{ \log nd }\right) \)

 Iordanis Kerenidis, Anupam Prakash - 8th Innovations in Theoretical Computer Science Conference - ITCS 2017.

Anupam Prakash - Quantum algorithms for linear algebra and machine learning. Diss. UC Berkeley - 2014.

Quantum query:

\( |i\rangle |0 \rangle \mapsto |i\rangle |x_i\rangle \)

Where  \(|x_i\rangle=\frac{1}{\|x_i\|_2} x_i = \frac{1}{\|x_i\|_2} \sum_i (x_i)_j |j\rangle \)

\( X \in \mathbb{R}^{\textcolor{red}{n \times d}} = [x_1, \dots, x_n]^T \)    \( x_i \in \mathbb{R}^d\)

  • Classical preprocessing time: \( O(\textcolor{red}{nd} \log nd) \)

  • Classical space: \( O\left( \textcolor{red}{nd} \log nd \right) \)

  • Query time: \( O(\textcolor{red}{\log nd}) \)

  • Quantum space: \( O \left(\textcolor{red}{ \log nd }\right) \)

 Iordanis Kerenidis, Anupam Prakash - 8th Innovations in Theoretical Computer Science Conference - ITCS 2017.

Anupam Prakash - Quantum algorithms for linear algebra and machine learning. Diss. UC Berkeley - 2014.

Where  \(|x_i\rangle=\frac{1}{\|x_i\|_2} x_i = \frac{1}{\|x_i\|_2} \sum_i (x_i)_j |j\rangle \)

Quantum query:

\( \frac{1}{\sqrt{n}}\sum_i |i\rangle |0 \rangle \mapsto \frac{1}{\sqrt{n}}\sum_i |i\rangle |x_i\rangle \)

\( X \in \mathbb{R}^{\textcolor{red}{n \times d}} = [x_1, \dots, x_n]^T \)    \( x_i \in \mathbb{R}^d\)

Quantum access to a matrix

Quantum linear systems of equations

Given:

  • quantum sparse access \(A \in \R^{n \times n}\),

  • and a vector \(x \in \R^n\)

The HHL algorithm produces a state \(|z\rangle\) such that

\| |z\rangle - |\textcolor{red}{A^{-1}x}\rangle \| \leq \epsilon
O(\frac{\kappa^2(A)s^2(A)}{\epsilon}\log(n))

\( \kappa(A) = \frac{\sigma_1(A)}{\sigma_n(A)} \) and \( s = \text{row's sparsity} \)

 

Harrow, Aram, Avinatan Hassidim, Seth Lloyd - Physical review letters - 2009 

Quantum linear systems of equations

The HHL algorithm produces a state \(|z\rangle\) such that

\| |z\rangle - |\textcolor{red}{A^{-1}x}\rangle \| \leq \epsilon

\( \kappa(A) = \frac{\sigma_1(A)}{\sigma_n(A)} \)

 

Harrow, Aram, Avinatan Hassidim, Seth Lloyd - Physical review letters - 2009 

\widetilde{O}(\|A\|_0) +
O(\frac{\kappa^2(A)\mu^2(A)}{\epsilon}\log(n))

Given:

  • quantum access \(A \in \R^{n \times n}\),

  • and a vector \(x \in \R^n\)

Quantum linear systems of equations

The HHL algorithm produces a state \(|z\rangle\) such that

\| |z\rangle - |\textcolor{red}{A^{-1}x}\rangle \| \leq \epsilon

\( \kappa(A) = \frac{\sigma_1(A)}{\sigma_n(A)} \)

 

Harrow, Aram, Avinatan Hassidim, Seth Lloyd - Physical review letters - 2009 

\widetilde{O}(\frac{\kappa^2(A)\mu^2(A)}{\epsilon})

Given:

  • quantum access \(A \in \R^{n \times n}\),

  • and a vector \(x \in \R^n\)

Given quantum access to a (sparse) \(A \in \R^{n \times n}\),  and a vector \(x \in \R^n\)

The HHL algorithm produces a state \(|z\rangle\) such that

\| |z\rangle - |\textcolor{red}{A^{-1}x}\rangle \| \leq \epsilon
\widetilde{O}(\frac{\kappa^2(A)s^2(A)}{\epsilon})

Harrow, Aram, Avinatan Hassidim, Seth Lloyd - Physical review letters - 2009 

Quantum linear systems of equations

\( \kappa(A) = \frac{\sigma_1(A)}{\sigma_n(A)} \)

\( s = \text{row's sparsity} \)

 

Quantum singular value transformations

Given matrix \(A \in \R^{n \times m}\),  and a vector \(x \in \R^m\)

It is possible to produce a state \(|z\rangle\) s.t.

\widetilde{O}(\kappa(A)\mu(A){\log (1/\epsilon)})
\| |z\rangle - |\textcolor{red}{Ax}\rangle \| \leq \epsilon
\| |z\rangle - |\textcolor{red}{A^{-1}x}\rangle \| \leq \epsilon
\widetilde{O}(\kappa(A)\mu(A){\log (1/\epsilon)})

In general:

\| |z\rangle - |\textcolor{red}{f(A)x}\rangle \| \leq \epsilon
\widetilde{O}(\text{poly}(\kappa(A), \mu(A)){\log (1/\epsilon)})

\(f(A) = \sum_i^d f(\sigma_i)|u_i\rangle \langle v_i| \)

Iordanis Kerenidis, Anupam Prakash - 8th Innovations in Theoretical Computer Science Conference - 2017.

András Gilyén, et al.  - Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing -  2019.

Guang Hao Low, Isaac L. Chuang -  Physical review letters  - 2017.

\(\mu(A) = \min\left(\|A\|_F, \sqrt{\max_{i \in [n]} \|a_i\|_{2p}^{2p} \max_{i \in [d]} \|a_{*i}\|_{2(1-p)}^{(1-p)} }\right) \)

\| |z\rangle - |\textcolor{red}{A_{k}^{\dagger}A_{k}x}\rangle \| \leq \epsilon
\widetilde{O}( \frac{ \kappa(A)\mu(A)}{\delta\epsilon} )

Distance estimation subroutines

  • Euclidean distance [QMEANS]

  • Quadratic forms [Thesis]

  • Distance induced by \(A\) [QEM]

\(|i,j\rangle \mapsto |i,j ,\|x_i- x_j\|_2\rangle  \)

\(|i,j\rangle \mapsto |i,j ,d_A(x_i,  x_j) \rangle  \)

\(|i,j\rangle \mapsto |i,j ,x_i^TA^\textcolor{orange}{-1}x_j \rangle  \)

\( \widetilde{O}(\frac{1 }{\epsilon} ) \)

\( \widetilde{O}(\frac{\mu(A)}{\epsilon} ) \)

\( \widetilde{O}(\frac{\mu(A)\textcolor{orange}{\kappa(A)}}{\epsilon} ) \)

\( X \in \mathbb{R}^{n \times d} = [x_1, \dots, x_n]^T \)    \( x_i \in \mathbb{R}^d\)

\| \overline{x} - |x\rangle \|_{\textcolor{red}{2}} \leq \epsilon

We can produce an estimate \(\overline{x}\) of \(|x\rangle\) such that \(\|\overline{x}\| = 1\) and

\| \overline{x} - |x\rangle \|_{\textcolor{red}{\infty}} \leq \epsilon

using

O\left(\frac{\log d}{\epsilon^2}\right)

Tomography of pure states

 Iordanis Kerenidis, Anupam Prakash -  ACM Transactions on Quantum Computing, 2020.

Iordanis Kerenidis, Anupam Prakash, Jonas Landman -  International Conference on Learning Representations, 2019.

samples

using

O\left(\frac{d\log d}{\epsilon^2}\right)

samples

SVE of product of matrices

Theorem: Assume to have quantum access to \(A, B \in \mathbb{R}^{n \times n} \).

There is an algorithm that performs the mapping

\( \sum_i \alpha_i |i\rangle \mapsto \sum_i \alpha_i|i, \sigma_i \rangle \)

where \(\sigma_i \) is the i-th singular value of \(AB\) in time

\(O\left(\frac{(\kappa(A)+\kappa(B))( \mu(A)+\mu(B))}{\epsilon}\right) \)

Shantanav Chakraborty., et al. - 46th International Colloquium on Automata, Languages, and Programming -  2019.

[QSFA]

Supervised

Learning

  • \(X \in \mathbb{R}^{n \times d} \) dataset

  • \( L \in [K]^{n} \) labels

(classification)

Quantum supervised learning

  • Quantum slow feature analysis for dimensionality reduction

  • Quantum Frobenius distance classifier: a simple NISQ classifier

  • Simulation on MNIST dataset of handwritten digits

  • Extension to other generalized eigenvalue problems in ML

[QSFA, Thesis]

https://cmp.felk.cvut.cz/cmp/software/stprtool

Slow Feature Analysis

\(X \in \mathbb{R}^{n \times \textcolor{orange}{d}} \) (images)

\(Y \in \mathbb{R}^{n \times \textcolor{orange}{K}} \)

\( y_i =\left[ w_1^Tx_i, \dots, w_K^Tx_i\right] \)

 \( \langle Y_{*j}\rangle =0 \)

\( \langle Y^2_{ij}\rangle = 1 \)

\( \forall j' <  j : \langle Y_{*j'} Y_{*j} \rangle  = 0 \)

\(L \in [K]^{n}  \) (labels)

Finding the model \( \{w_j\}_{j=1}^K \) reduces to a constrained optimization problem:

  • The componentwise average should be zero.

  • The componentwise variance should be \(1\).

  • Signals are maximally uncorrelated. 

Constraints

Slow Feature Analysis

\( \Delta_j =  \frac{1}{a} \sum_{j=1}^K \sum (w_j^Tx^{(j)}_s - w_j^Tx^{(j)}_t)^2 \)

 

\(X \in \mathbb{R}^{n \times \textcolor{orange}{d}} \) (images)

\(Y \in \mathbb{R}^{n \times \textcolor{orange}{K}} \)

\( y_i =\left[ w_1^Tx_i, \dots, w_K^Tx_i\right] \)

\(L \in [K]^{n}  \) (labels)

Finding the model \( \{w_j\}_{j=1}^K \) reduces to an optimization problem:

\(\substack{s,t \in T_k \\ i<j}\)

Quantum Slow Feature Analysis

Theorem:

  • Quantum access to \(X\), derivative matrix \(\dot{X}\)

  • Let \( \epsilon, \theta, \delta, \eta >0\)

     There are quantum algorithms to get:

  • Map dataset in the slow feature space \( |\overline{Y}\rangle \)  in time: $$ \tilde{O}\left(  \frac{ ( \kappa(X) + \kappa(\dot{X})) ( \mu({X})+ \mu(\dot{X}) ) }{\delta\theta}  \gamma_{K-1} \right) $$

  • Find \( \{ w_j\}_{j=0}^{K}\) in time:    $$O\left(d^{1.5}\sqrt{K}\frac{\kappa(X)\kappa(\dot X X)(\mu(X) + \mu(\dot{X}))}{\epsilon^2}\right)$$

[QSFA]

QFDC: classification in slow feature space

Theorem: Assume to have quantum access to \(|Y\rangle\) in time T, we can label images in \(k\) classes in time \(O(\frac{kT}{\epsilon}) \).

Definition: QFDC (Quantum Frobenius distance classifier):

A point is assigned to the cluster with smallest

normalized--average-squared distance

between the point and the points of the cluster.

[QSFA]

We get high accuracy with a fast quantum classifier

Classification of handwritten digits of MNIST dataset

Ex: polynomial expansion

of degree 2:

\([x_1, x_2, x_3] \mapsto [x_1^2, x_1x_2 \dots x_3^3 ]\)

We get high accuracy with a fast quantum classifier

Malware detection via DGA classification

Accuracy Original classifier With slow-feature
Logistic Regression 89%

90.5% (+1.5%)

Naive Bayes classifier 89.3%

92.3% (+3%)

Decision Trees 91.4% 94.0% (+2.6%)

[Thesis]

  • Using SFA in a classification problem improves its accuracy

  • QSFA can process old datasets in new ways!

SFA as instance of a more general problem

The GEP (Genrealized Eigenvalue Problem) is defined as:

  • \(B=X^TX\)
  • \(A=\dot X^T \dot X \)

In SFA:

  • ICA Independent Component Analysis

  • G-IBM Gaussian Information Bottleneck Method

  • CCA Canonical Correlation Analysis

  • SC (some) Spectral Clustering  [1]

  • PLS Partial Least Squares

  • LE Laplacian Eigenmaps

  • FLD Fisheral Linear Discriminant

  • SFA Slow Feature Analysis

  • KPCA Kernel Principal Component Analysis

 

\( AW = BW\Lambda \)

[1] Iordanis Kerenidis, Jonas Landman - Quantum spectral clustering. arXiv2007.00280. (2020)

Quantum supervised learning

  • Quantum slow feature analysis for dimensionality reduction

  • Simulation on MNIST dataset of handwritten digits

  • Quantum Frobenius distance classifier: a simple NISQ classifier

  • Extension to other generalized eigenvalue problems in ML

[QSFA, Thesis]

Unsupervised

Learning

  • \(X \in \mathbb{R}^{n \times d} \)

(clustering)

Quantum unsupervised

learning

  • q-means for clustering (quantum version of k-means)

  • Quantum Expectation-Maximization

  • Simulation on VoxForge dataset for speaker recognition

The k-means algorithm:

\( t \leftarrow 0 \)

Step 1:

  • Compute distance for all points \(x_i \) and centroid \(\mu_j^{t}, \)  $$ d(x_i, \mu_j^{t}) $$

  •  Assign points to closest cluster: $$ l(x_i) = \argmin_{c \in [K]}{ d(x_i, \mu_j^{t})}$$

 Step 2:

  • Compute the barycenter:  $$\mu_j^{t+1} = \frac{1}{|C_j|}\sum_{i \in C_j}^{} x_i $$

t \(\leftarrow \)t+1

\(O(tkdn) \)

q-means

\( t \leftarrow 0 \)

Step 1:

  • Compute distance for all points \(v_i \) and centroid \(\mu_j^{t}, \)  $$|i,j\rangle \mapsto  |i,j,d(v_i, \mu_j^{t})\rangle $$

  •  Generate characteristic vector of a cluster: $$ |\chi_j \rangle = \frac{1}{|C_j|}\sum_{i \in C_j} |i \rangle$$

Step 2:

  • Use quantum linear algebra to build $$|\mu_j^{t+1}\rangle = \frac{1}{|C_j|}\sum_{i \in C_j}^{} |v_i\rangle $$

  • Perform tomograph on \( |\mu_j^{t+1}\rangle \)

  • Build quantum access to \( \mu_j\).

\( t \leftarrow t+1 \)

 

q-means

Theorem: Given quantum access to a matrix \(X \in \mathbb{R}^{n \times d} \), there is quantum algorithm that fits a k-means model in time:


\(    \widetilde{O}\left( k^2 d \frac{\eta^{2.5}}{\delta^3}  \right) \)

 

Classical: \( O\left(\textcolor{red}{n}kd\right) \)

[QMEANS]

\( \| \mu_j - \mu_j^* \| \leq \delta  \)

\( \eta = \max_i (\|x_i\|^2 ) \)

k-means learns the cluster's barycenters

\( \{\mu_j\}_{j=1}^K = \text{argmin} \sum_i^n d(x_i, \mu_{l(x_i)} \))

For a dataset \( \{ x_i \}_{i}^n  \) finds centroids \(\{ \mu_j\}_{j=1}^K\) such that:

Text

i.e. minimize distance between points and their cluster's barycenter.

q-means

\( t \leftarrow 0 \)

Expectation:

  • Compute distance for all points \(v_i \) and centroid \(\mu_j^{t}, \)  $$|i,j\rangle \mapsto  |i,j,d(v_i, \mu_j^{t})\rangle $$

  •  Generate characteristic vector of a cluster: $$ |\chi_j \rangle = \frac{1}{|C_j|}\sum_{i \in C_j} |i \rangle$$

Maximization:

  • Use quantum linear algebra to build $$|\mu_j^{t+1}\rangle = \frac{1}{|C_j|}\sum_{i \in C_j}^{} |v_i\rangle $$

  • Perform tomograph on \( |\mu_j^{t+1}\rangle \)

  • Build quantum access to \( \mu_j\).

\( t \leftarrow t+1 \)

 

Gaussian mixture models

\( k \) labels

Multinomial distribution

\( [\theta, \mu_1, \dots, \mu_k, \Sigma_1, \dots, \Sigma_k ] \)

\( \gamma^* =   \text{argmax} \prod_i \sum_{j \in [k]} \theta_j p(x_i|\mu_j,\Sigma_j) \)

Gaussian distribution

\(_\gamma\)

  •     \(\|\theta-\overline{\theta}\|<\delta_\theta \)
  • \(\|\mu_j - \overline{\mu_j}\| < \delta_\mu \)
  • \(\|\Sigma_j - \overline{\Sigma_j}\| < \delta_\mu\sqrt{\eta}\)

Error introduced by quantum algorithm parameters

Maximum Likelihood Estimation

Expectation-Maximization

                      Repeat

                      \(t = 0\)

  • Expectation   \[ r_{ij}^t \leftarrow \frac{\theta^t_j N(v_i; \mu^t_j, \Sigma^t_j )}{\sum_{l=1}^k \theta^t_l N(v_i; \mu^t_l, \Sigma^t_l)}\]

                   

                         \( t \leftarrow t+1 \)

                      Until  \( | \ell(\gamma^{t-1};V) - \ell(\gamma^t;V) | < \tau \)

  • Maximization

 

  • Update the parameters \(\theta, \mu, \Sigma \) using the responsibilities \(r_{ij} \)

Quantum Expectation-Maximization

                      Repeat

                      \(t = 0\)                 

                         \( t \leftarrow t+1 \)

                      Until  \( | \ell(\gamma^{t-1};V) - \ell(\gamma^t;V) | < \tau \)

  • Maximization

 

  • Use \(U_R \) to generate states proportional to \(\theta^{t+1}, \mu^{t+1}, \Sigma^{t+1} \)

  • Perform tomography and create quantum access.

  • Create mapping \[ U_R |i,j\rangle|0\rangle \mapsto |i,j\rangle |r_{ij} ^t\rangle \]

  • Expectation  

Quantum Expectation-Maximization

                      Repeat

  • Expectation   \[ |r_{ij}^t\rangle \leftarrow \frac{\theta^t_j N(v_i; \mu^t_j, \Sigma^t_j )}{\sum_{l=1}^k \theta^t_l N(v_i; \mu^t_l, \Sigma^t_l)}\]
  • Maximization

\( |\theta_j^{t+1}\rangle \leftarrow \frac{1}{n}\sum_{i=1}^n r^{t}_{ij} \)

       \(  |\mu_j^{t+1}\rangle \leftarrow \frac{\sum_{i=1}^n r^{t}_{ij} v_i }{ \sum_{i=1}^n r^{t}_{ij}}\)      

   \( |\Sigma_j^{t+1}\rangle \leftarrow \frac{\sum_{i=1}^n r^{t}_{ij} (v_i - \mu_j^{t+1})(v_i - \mu_j^{t+1})^T }{ \sum_{i=1}^n r^{t}_{ij}} \)

                      \( t \leftarrow t+1\)

                       Until  \( | \overline{\ell(\gamma^{t-1};V)} - \overline{\ell(\gamma^t;V)} | < \tau \)

Quantum Expectation-Maximization

                      Repeat

  • Expectation   \[ r_{ij}^t \leftarrow \frac{\theta^t_j N(v_i; \mu^t_j, \Sigma^t_j )}{\sum_{l=1}^k \theta^t_l N(v_i; \mu^t_l, \Sigma^t_l)}\]
  • Maximization

 

Update the parameters \(\theta, \mu, \Sigma \) using the responsibilities \(r_{ij} \)

                      \( t \leftarrow t+1 \)

                      Until  \( | \ell(\gamma^{t-1};V) - \ell(\gamma^t;V) | < \tau \)

Quantum Expectation-Maximization for

Gaussian mixture models


           Theorem: Given quantum access to a matrix \( X \in \mathbb{R}^{n \times d} \) there is a quantum EM algorithm that fits a GMM in time:

[QEM]

Classical: \( O(d^2 k n) \)

\( O\left( d^2k^{4.5}  \gamma(X)\textcolor{red}{\log n}\right) \)                  \(\gamma(n)= O\left( \frac{\eta^3 \kappa(X)\kappa^2(\Sigma)\mu(\Sigma)\mu(X)}{\delta^3} \right) \)

We get high accuracy with a fast quantum classifier

  • Classical ML accuracy: 169/170

  • Quantum ML accuracy: 167/170

  • Max element of \( \Sigma_j^{-1}\) set to \(5\)  via \(\kappa = \frac{1}{\lambda_\tau} \)

Speaker recognition problem on VoxForge dataset

Quantum unsupervised

learning

  • q-means for clustering (quantum version of k-means)

  • Quantum Expectation-Maximization

  • Simulation on VoxForge dataset for speaker recognition

Quantum Spectral Sums

\(S_f(A) = \sum_i^n f(\lambda_i) \)

\(S_f(A) = \sum_i^n f(\sigma_i) \)

\(A \in \mathbb{R}^{n \times n} \) SPD

\( f : \mathbb{R} \mapsto \mathbb{R}  \)

Theorem: Quantum access to a SPD matrix \( A \),

\(\|A\| < 1 \) and \(\epsilon \in(0,1)\).

 

There is a quantum algorithm that estimate  \( \log\det(A)\) with relative error \(\epsilon \) w.h.p. in time \( \widetilde{O}({\mu(A) \kappa(A)}/{\epsilon})\).

Quantum algorithms for log-determinant

\(  S_{\log(x)}(A) =\log\det(A) = \sum_i^n \log(\lambda_i) \)

Application: Tyler's M-estimator.

[QSS]

Tyler's M-estimator

\(\Gamma_* \leftarrow \frac{1}{n} \sum_{i=1}^n \frac{x_ix_i^T}{x_i^T\Gamma_*^{-1}x_i}  \)

 

  • Data from sub-Gaussian distributions.

  • Robust to outliers

  • Valid for data \( X^{n \times d}\) with \( n,d \mapsto \infty \)

[Thesis, AQSS]

In many cases \(C = X^TX\) is not a "good" sample covariance matrix

Tyler's M-estimator

[Thesis, AQSS]

Goes, J, et al. -  The Annals of Statistics  2020.

Might benefits from componentwise thresholding:

Runtime:

 

\[ \tilde O\left(\textcolor{red}{d^2}\frac{\mu(X)\kappa(\Sigma_k)\mu(\Sigma_k)}{\epsilon^3}\gamma\right) \]

\( \Gamma_{k+1} = \sum_{i=1}^n \frac{x_ix_i^T}{ x_i^T \Gamma_k^{-1}x_i}/ Tr[\sum_{i=1}^n \frac{x_ix_i^T}{x_i^T \Gamma_k^{-1}x_i}]\)

Stopping condition: log-likelihood with a log-determinant

Classical:

\( O\left( d^2n \right) \)

Other spectral sums and applications (not in thesis)

  • Schatten p-norm             \(O(2^{p/2}\mu(A)(p+\kappa(A))\sqrt{n}/\epsilon) \)

  • Von Neumann entropy       \(O(\mu(A)\kappa(A)n /\epsilon )\)

  • Trace of Inverse              \(O(\mu^2 \kappa(A)^2/\epsilon) \)

Applications..

  • Counting number of spanning trees

  • Counting triangles

  • Estimating effective resistance

  • Training Gaussian processes..

  • ...

[QSS]

Thanks

Elham Kashefi
Iordanis Kerenidis
Frédéric Magniez
Filippo Miatto
Simon Perdrix
Simone Severini

Conclusions and outlook

  • We have a corpus of algorithms with provable speedups.

    • Simple to extend current algorithms to more powerful models.

  • Quantum algorithms seems to work promisingly well in ML:

    • \(\kappa(A) \), \(\mu(A), s, \eta, \epsilon \),

  • QML might allow solving new or  existing problems:

    • better, faster, cheaper, or a combination.

In a glorious future, with fault-tolerant quantum computers and quantum access to data:

  • Artificial Intelligence might be promising to explore

    • Smaller QRAM?

  • We should work directly on state-of-the-art ML algorithm:

    • Interpretable, explainable, fair, robust, privacy-preserving ML.

Thanks for your time, there is never enough.

Runtimes of CA and LSA

Quantum correspondence analysis

\(    \widetilde{O}\left( \frac{1}{\epsilon\gamma^2} + \frac{k\textcolor{red}{(n+m)}}{\theta\epsilon\delta^2}\right)  \)

Quantum latent semantic analysis:

\(    \widetilde{O}\left(\left( \textcolor{blue}{\frac{1}{\epsilon\gamma^2}} + \frac{k\textcolor{red}{(n+m)}}{\theta\epsilon\delta^2}\right)\mu(A) \right) \)

Armando Bellante - Master's thesis

Presented at Quantum Natuarl Language Processing Conference 2020

Correspondence Analysis (CA)

Q = diag(p_X)^{-1/2}(P_{X,Y} - p_Xp_Y^T)diag(p_Y)^{-1/2}

Orthogonal factors:

  • \(F_X = diag(p_X)^{-1/2}U^{(k)}\)
  • \(F_Y = diag(p_Y)^{-1/2}V^{(k)}\)

Factor scores:

  • \( \lambda_i =\sigma_i^2  \)

              Factor score ratios:

  • \( \lambda^{(i)}= \frac{\lambda_i}{\sum_j^r\lambda_j}\)
Q = U\Sigma V^T

Consider two categorical random variables X, Y, and let \(C\) be matrix of occurrences.

= \frac{\sigma_i^2}{\sum_j^r\sigma_j^2}

\( \hat{P}_{X,Y} = \frac{C}{\sum_{i=1}^{|X|} \sum_{j=1}^{|Y|} c_{ij}} = \frac{1}{n}C \)

\(\hat{p}_{X} = \hat{P}_{X,Y}1_{|Y|} \) and \(\hat{p}_{Y} = 1_{|X|}^T\hat{P}_{X,Y} \)

 

 

Latent Semantic Analysis (LSA)

\begin{bmatrix} \cdot & \cdot & \cdot\\ \cdot & \cdot & \cdot\\ \cdot & \cdot & \cdot\\ \end{bmatrix}
\begin{matrix} w_1\\ \cdot\\ w_n\\ \end{matrix}
\begin{matrix} d_1 & \cdot & d_m \\ \end{matrix}
A =
= U \Sigma V^T

Comparing words:

\(AA^T = U\Sigma^2U\)

\(L = U^{(k)}\Sigma^{(k)}\)

Comparing docs:

\(A^TA=V\Sigma^2V\)

\(R = V^{(k)}\Sigma^{(k)}\)

Comparing W & D:

\(A=U\Sigma V\)

\(L' = U^{(k)}\Sigma^{(k)1/2}\)

\(R' = V^{(k)}\Sigma^{(k)1/2}\)

Evolution of Mutual Information between layers while training a QNN

The dropout technique for avoiding barren plateaus

Rebecca Erbanni - Master's thesis

Alexander Singh - IRIF internship

Counting triangles

\[ \Delta(G) = \frac{1}{6} Tr[A^3] \]

  • Create block encoding of \(B=A^{1.5}\)

  • Estimate \(Tr[B^TB] \)

 

Van Apeldoorn, Joran, et al. - 58th Annual Symposium on Foundations of Computer Science - 2017.

\(\widetilde{O} \Big( \frac{n^{1/2}s^2(A) \kappa (A) }{\sqrt{\Delta(G)}\epsilon} \Big) \)

Hamoudi, Y,  F. Magniez - 46th International Colloquium on Automata, Languages, and Programming - 2019.

\( \widetilde{O}\left(  \left( \frac{n^{1/2}}{\Delta^{1/6}(G)}  + \frac{m^{3/4}}{\sqrt{\Delta(G)}}  \right) \cdot \text{poly}{(1/\epsilon)}     \right)  \)

= \(\widetilde{O} \Big( \frac{m^{1/2}s^{1.5}(A) \kappa (A) }{\sqrt{\Delta(G)}\epsilon} \Big) \)

Canonical Correlation Analysis

Input: \( \{ (x_i, y_i) \}_{i=0}^n\) where \( x_i \in \mathbb{R}^{d_1}\) and \(y_i \in \mathbb{R}^{d_2}\) i.e. matrices \(X,Y\)

Classical:

  • Step1: Solve GEP \[ \Sigma_{XY}\Sigma_{YY}^{-1}\Sigma_{YX}w_x = \lambda^2\Sigma_{XX}w_x \]

  • Step 2: Find \(w_y \) by

    \( w_y = \frac{\Sigma_{YY}^{-1}\Sigma_{YX}w_x}{\lambda} \)

CCA model: find \(w_x, w_y\) such that

\( {w_x, w_y} =\arg\max_{w_x, w_y} cos((Xw_x, Yw_y)) \)

Quantum:

\( \Sigma_{XX}\Sigma_{XY}^{-1}\Sigma_{YY} = U\Sigma V^T \)

\( W_x = \Sigma_{XX}^{-1/2}U \)

 \( W_y = \Sigma_{YY}^{-1/2}V \).

Quantum algorithms for model checking

A state-space exploration approach

  • Formalize our software as an automaton \(A_P\).

  • For a temporal property \(f\) we build the automaton \(A_{\neg f}\).

  • Solve the emptiness problem of the language:  \[L(A_P\times A_{\neg f}) = \emptyset\].

Software \(\mapsto\) specification LTL

\(\mapsto\) Büchi automata \(\mapsto\)  \(\omega\)-language

Idea: use quantum DFS!

Dürr, Christoph, et al. - SIAM Journal on Computing 35.6 (2006)

Theorem: The emptiness problem for \(\omega\)-languages is decidable!

\widetilde{O}\left( \sqrt{VE} \right)