Case-control analysis of single-cell RNA-seq studies

Petukhov V, Igolkina A, Rydbirk R, Mei S, Christoffersen L, Kharchenko P, Khodosevich K

viktor.petukhov@pm.me

Description of the problem

Two conditions, multiple samples per condition, multiple cells per sample

Case samples

Control samples

What is going on in our data?

Description of the problem

What is going on in our data?

  • What mechanisms underlie the disease?
  • What parts of the data were affected the most?
  • How these differences can be described?
  • What do these differences mean biologically?

Existing solutions

Existing solutions

Align samples

scVI, Conos, ..., Seurat

See the review from the Theis lab

Existing solutions

Align samples

Perform joint annotation

Existing solutions

Align samples

Perform joint annotation

Run differential expression 

Existing solutions

Align samples

Perform joint annotation

Run differential expression 

Run Gene Ontology analysis

Existing solutions

Align samples

Perform joint annotation

Run differential expression 

Run Gene Ontology analysis

Compare cell type proportions

Existing solutions

Align samples

Perform joint annotation

Run differential expression 

Run Gene Ontology analysis

Compare cell type proportions

Can we do better?

There are up to hundreds significant GO terms per type

There are up to 1000 significant DE genes per type

Compositional analysis

Gene expression analysis

Case-control analysis of single-cell studies: a fresh approach

What can we possibly do?

What can we possibly do?

Gene expression analysis

Compositional analysis

Cluster-based

Cluster-free

Case-control analysis of single-cell studies: a fresh approach

Control

Multiple sclerosis

Composition analysis, cluster-based

Composition analysis, cluster-based

Problem: changes are not independent

*Credit to Anna Igolkina

Composition analysis, cluster-based

Problem: changes are not independent

*Credit to Anna Igolkina

Composition analysis, cluster-based

*

*

*

*

*

*

*

*

*

Composition analysis, cluster-based

*

*

*

*

*

*

*

*

*

Neurons

Glia

Composition analysis, cluster-free

Control

Multiple sclerosis

Composition analysis, cluster-free

Effect size

Significance

Expression analysis, cluster-based

How separated our conditions are?

L2-L3 EN

Expression analysis, cluster-based

Visualization of sample structure

L2-L3 EN

L2-L3 EN

Expression analysis, cluster-based

Visualization of sample structure

Color by batch

Aggregated across all cell types

Expression analysis, cluster-based

Visualization of sample structure

Expression analysis, cluster-based

How separated our conditions are?

separation

L2-L3 EN

Expression analysis, cluster-free

Differential expression on single cells

Expression analysis, cluster-free

Differential expression on single cells

Expression analysis, cluster-free

Differential expression on single cells

control

epilepsy

Expression analysis, cluster-free

Differential expression on single cells

control

epilepsy

Differential expression

Expression analysis, cluster-free

Differential expression on single cells

PDE10A

MS

Control

NCKAP5

MS

Control

Expression analysis, cluster-free

Gene programs

Gene expression analysis

Compositional analysis

Cluster-based

Cluster-free

Control

Multiple sclerosis

Summary

Full analysis of the Epilepsy dataset

Full analysis of the Epilepsy dataset

Compositional analysis

No significance!

Full analysis of the Epilepsy dataset

Compositional analysis, cluster-free

No significance!

Full analysis of the Epilepsy dataset

Compositional analysis, cluster-free

Effect size

Significance

Full analysis of the Epilepsy dataset

Expression analysis: sample embeddings

Condition

Full analysis of the Epilepsy dataset

Expression analysis: sample embeddings

Condition

Sex

Full analysis of the Epilepsy dataset

Expression analysis: sample embeddings

Condition

Sex

Protocol

Full analysis of the Epilepsy dataset

Expression analysis: shift magnitudes

Full analysis of the Epilepsy dataset

Expression analysis: shift magnitudes

Full analysis of the Epilepsy dataset

Expression analysis: shift magnitudes

Full analysis of the Epilepsy dataset

Expression analysis: cluster-free DE

Full analysis of the Epilepsy dataset

Programs for excitatory neurons

Full analysis of the Epilepsy dataset

Global excitatory program

Full analysis of the Epilepsy dataset

Programs for inhibitory neurons

Full analysis of the Epilepsy dataset

Joint program

Full analysis of the Epilepsy dataset

Expression analysis: local cluster-free DE

L2_Cux2_Lamp5 programs

Full analysis of the Epilepsy dataset

Expression analysis: local cluster-free DE

L2_Cux2_Lamp5 programs

Full analysis of the Epilepsy dataset

Expression analysis: local cluster-free DE

Full analysis of the Epilepsy dataset

GO and DE sugar

Full analysis of the Epilepsy dataset

DE instability

Use top DE genes instead of p-cutoff!

Epilepsy dataset

Cancer dataset

Full analysis of the Epilepsy dataset

DE instability

Use pseudo-bulk DE, not single-cell methods!

Single-cell-based methods fail even on a theoretical level

Thank you!

Konstantin Khodosevich lab

Peter Kharchenko

  • Rasmus Rydbirk
  • Anna Igolkina
  • Shenglin Mei
  • Lars Christoffersen

Co-authors

viktor.petukhov@pm.me

Gene expression analysis

Compositional analysis

Cluster-based

Cluster-free

Control

Multiple sclerosis

Summary

Cacoa, SC Seminar Aug 2021

By Viktor Petukhov

Cacoa, SC Seminar Aug 2021

  • 954