Case-control analysis of single-cell RNA-seq studies

Petukhov V, Rydbirk R, Igolkina A, Mei S, Kharchenko P, Khodosevich K

viktor.petukhov@pm.me

Description of the problem

Two conditions, multiple samples per condition, multiple cells per sample

Case samples

Control samples

Description of the problem

What cell types are affected and how?

Prepare for further experiments:

  • Which subtypes should we focus on?
  • Which genes per subtype should we investigate further?

Questions to existing data:

  • Did some cell types change their expression in a similar way?
  • Which genes changed their expression in a similar way?
  • All other patterns in expression changes we can think of

Existing solutions

Existing solutions

Align samples

scVI, Conos, ..., Seurat

See the review from the Theis lab

Existing solutions

Align samples

Perform joint annotation

Existing solutions

Align samples

Perform joint annotation

Run differential expression 

Existing solutions

Align samples

Perform joint annotation

Run differential expression 

Run Gene Ontology analysis

Existing solutions

Align samples

Perform joint annotation

Run differential expression 

Run Gene Ontology analysis

Compare cell type proportions

Existing solutions

Align samples

Perform joint annotation

Run differential expression 

Run Gene Ontology analysis

Compare cell type proportions

Why it is not enough

  • What cell types are the most affected?
  • How exactly are they affected?

The main questions:

Why it is not enough

Number of DE genes is not an answer

  • What cell types are the most affected?
  • How exactly they are affected?

The main questions:

Why it is not enough

There are up to 1000 significant DE genes per type and 2795 unique DE genes in total

  • What cell types are the most affected?
  • How exactly they are affected?

The main questions:

Why it is not enough

There are up to 1000 significant DE genes per type and 2795 unique DE genes in total

  • What cell types are the most affected?
  • How exactly they are affected?

The main questions:

There are up to 300 significant GO terms per type and 796 unique terms in total

Why it is not enough

DE depends on the depth and quality of the annotation

Compositional analysis

Gene expression analysis

Case-control analysis of single-cell studies: a fresh approach

What can we possibly do?

What can we possibly do?

Gene expression analysis

Compositional analysis

Cluster-based

Cluster-free

control

epilepsy

Case-control analysis of single-cell studies: a fresh approach

Composition analysis, cluster-based

Composition analysis, cluster-based

Problem: changes are not independent

Composition analysis, cluster-based

Composition analysis, cluster-free

Control

Multiple sclerosis

Composition analysis, cluster-free

Wilcoxon test between samples

Expression analysis, cluster-based

What cell types are affected the most?

Expression analysis, cluster-based

What cell types are affected the most?

Expression analysis, cluster-based

What cell types are affected the most?

Expression analysis, cluster-based

What cell types are affected the most?

Z = \frac{d_{between}}{\overline{d}_{control}}

Expression analysis, cluster-based

What cell types are affected the most?

Expression analysis, cluster-based

Can we trust differential expression?

Expression analysis, cluster-based

Can we trust differential expression?

Expression analysis, cluster-based

Can we trust differential expression?

False discoveries

Signal

Expression analysis, cluster-free

Differential expression on single cells

Expression analysis, cluster-free

Differential expression on single cells

Expression analysis, cluster-free

Differential expression on single cells

control

epilepsy

Expression analysis, cluster-free

Differential expression on single cells

control

epilepsy

Differential expression

Expression analysis, cluster-free

Differential expression on single cells

Expression analysis, cluster-free

Differential expression on single cells

Expression analysis, cluster-free

Gene programs on single cells

Program 1

Expression analysis, cluster-free

Gene programs on single cells

Program 2

Expression analysis, cluster-free

Expression distances

Gene expression analysis

Compositional analysis

Cluster-based

Cluster-free

Summary

Thank you!

Konstantin Khodosevich lab

Peter Kharchenko

  • Rasmus Rydbirk
  • Anna Igolkina
  • Shenglin Mei

Co-authors

viktor.petukhov@pm.me

Cacoa, SCS 2020

By Viktor Petukhov

Cacoa, SCS 2020

  • 536