Viktor Petukhov , Peter Kharchenko
1,2
2,3
Harvard Stem Cell Institute
Manual annotation is painful!
Based on
annotation
transfer
Based on
marker genes
Annotated cells
(e.g. published data)
Not-annotated
cells (e.g. your data)
Problems:
*Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling, Nature Methods 2019
Benefits
Drawbacks
Graph diffusion (using Conos routines):
> AT2
expressed: Bex4
> AT1
expressed: Cryab
>Ciliated cells
expressed: Aldh1a1, Cyp2f2
> Interstitial macrophage
expressed: Apoe, Pf4
not expressed: Trbc2
> Alveolar macrophage
expressed: Ear1, Ear2
>T cells
expressed: Cd8b1, Trbc2
>Natural killer cells
expressed: Klra8, Nkg7
not expressed: Trbc2
> Naaa DCs
expressed: Naaa
> Mgl2 DCs
expressed: Mgl2
> Plasmacytoid DCs
expressed: Plac8
> H2-M2 DCs
expressed: Epsti1, H2-M2
>Granulocytes
expressed: Il1b, Il1r2
>Endothelial
expressed: Pecam1, Flt1, Chd5, Kdr
>Fibroblasts
expressed: Dcn, Acta2, Inmt
>B cells
expressed: Cd19, Ms4a1, Cd79a
>Monocyte progenitor cell
expressed: Ctsg, Mpo
>Basophil
expressed: Ccl3, Ccl4
Cell Types
Garnett
Accuracy: 32.3%
Unclassified: 62.7%
Average TPR: 40.1%
Average Precision: 73.2%
Our code
Accuracy: 96.6%
Average TPR: 93.9%
Average Precision: 90.3%
Paper
CellAnnotatoR
Garnett
We couldn't get good results with CellAssign
(and we're not alone in this: Issue #35 "About results reproducibility")
CellAssign also doesn't use info about negative markers and cell type hierarchies
Accuracy: 7.0%
Average TPR: 10.2%
Average Precision: 6.2%
Cell Types
>Inhibitory
expressed: Gad1
not expressed: Slc17a7
>Excitatory
expressed: Slc17a6, Slc17a7, Sema3c
not expressed: Gad1
>OD Mature
expressed: Ttyh2, Mbp, Opalin
not expressed: Pdgfra
>OD Immature
expressed: Pdgfra, Mki67
>Astrocyte
expressed: Aqp4
>Microglia
expressed: Selplg
>Ependymal
expressed: Cd24a
not expressed: Gad1
>Endothelial
expressed: Fn1
>Pericytes
expressed: Myh11
>Endothelial 1
expressed: Igf1r
subtype of: Endothelial
>Endothelial 2
expressed: Bmp7, Lepr
subtype of: Endothelial
>Endothelial 3
expressed: Ace2
subtype of: Endothelial
>OD Immature 1
expressed: Traf4
subtype of: OD Immature
>OD Immature 2
expressed: Mki67
subtype of: OD Immature> AT2
expressed: Bex4
> AT1
expressed: Cryab
>Ciliated cells
expressed: Aldh1a1, Cyp2f2
> Interstitial macrophage
expressed: Apoe, Pf4
not expressed: Trbc2
> Alveolar macrophage
expressed: Ear1, Ear2
>T cells
expressed: Cd8b1, Trbc2
>Natural killer cells
expressed: Klra8, Nkg7
not expressed: Trbc2
> Naaa DCs
expressed: Naaa
> Mgl2 DCs
expressed: Mgl2
Garnett
Accuracy: 23.3%
Unclassified: 75.1%
Average TPR: 8.4%
Average Precision: 26.2%
CellAnnotatoR
Accuracy: 90.0%
Average TPR: 84.6%
Average Precision: 83.7%
Paper
CellAnnotatoR
Garnett
Black crosses are ambiguous
(our data)
>Astrocytes
expressed: SLC1A3, GJB6, FGFR3
not expressed: RBFOX3, SYP
> Microglia
expressed: CX3CR1, GPR34, P2RY12, MRC1
not expressed: RBFOX3, SYP
>Oligodendrocytes
expressed: MOG, ERMN
not expressed: RBFOX3, SYP
>Oligodendrocyte Precursors
expressed: CSPG4, PDGFRA, VCAN
not expressed: RBFOX3, SYP
>Vascular
expressed: DCN, PTGDS, ATP1A2, ITIH5, FLT1
not expressed: RBFOX3, SYP
>Neurons
expressed: SYT1, SYP, SNAP25, RBFOX3
not expressed: MOG, ERMN, SLC1A3, CX3CR1, GPR34
# Neurons
>Inhibitory
expressed: GAD1, GAD2, SOX6, PVALB, SST, VIP, LHX6, NDNF, CALB2, SULF1
not expressed: SLC17A7, SATB2
subtype of: Neurons
>Excitatory
expressed: SLC17A7, SATB2, RORB, CUX2, TLE4, NR4A2, SEMA3C
not expressed: GAD1, GAD2, SOX6, PVALB
subtype of: Neurons
# Inhibitory
>Pvalb
expressed: PVALB, NOS1, SULF1, LHX6, KCNS3, CRH, PLEKHH2
not expressed: LAMP5, ID2, SST, FAM89A, RELN, SEMA6A, TAC3, DDR2, VIP
subtype of: Inhibitory
>Lamp5
expressed: ID2, LAMP5, SV2C, PDGFD, CCK, RELN
not expressed: VIP, CALB2, SST, FAM89A, DDR2, NR2F2
subtype of: Inhibitory
>Sst
expressed: SST, NOS1, SEMA6A, FAM89A, LHX6
not expressed: VIP, CALB2, CRH, CHAT, CCK, LAMP5, ID2, SV2C, PDGFD, PVALB, KCNS3
subtype of: Inhibitory
>Vip
expressed: VIP, TAC3, CALB2, NR2F2, LAMA3, COL5A2, SEMA3C, FAM19A1
not expressed: ID2, NOS1, LAMP5, PDGFD
subtype of: Inhibitory
## PVALB
>Pvalb_Nos1
expressed: NOS1
not expressed: CRH
subtype of: Pvalb
>Pvalb_Sulf1
expressed: SULF1
not expressed: NOS1, CRH
subtype of: Pvalb
>Pvalb_Crh
expressed: CRH, PLEKHH2
not expressed: NOS1, RGS5
subtype of: Pvalb
## LAMP5
>Lamp5_Nos1
expressed: NOS1, SFRP1
not expressed: LAMA3
subtype of: Lamp5
>Lamp5_Crh
expressed: CRH, SFRP1
subtype of: Lamp5
>Lamp5_Reln
expressed: RELN, LAMA3
not expressed: ID2
subtype of: Lamp5
## SST
>Sst_Tac3_Lhx6
expressed: TAC3, LHX6
not expressed: CALB1
subtype of: Sst
>Sst_Calb1
expressed: CALB1
not expressed: TAC3
subtype of: Sst
## VIP
>Vip_Crh
expressed: CRH, TAC3, IGFBP5
not expressed: SEMA3C, SEMA6A, NR2F2
subtype of: Vip
>Vip_Nr2f2
expressed: CRH, NR2F2, IGFBP5
not expressed: SEMA3C, SEMA6A, TAC3, RELN
subtype of: Vip
>Vip_Sema3
expressed: SEMA3C, SEMA6A, COL5A2
not expressed: CRH, RELN
subtype of: Vip
>Vip_Reln
expressed: RELN, DDR2
not expressed: TAC3, SEMA3C, IGFBP5
subtype of: Vip
>Vip_Cck
expressed: CCK, FAM19A1, NR2F2
not expressed: RELN, TAC3, IGFBP5, SEMA3C
subtype of: Vip
# Excitatory
>L2/3_Cux2
expressed: LAMP5, CUX2, COL5A2
not expressed: PDGFD, FAT4, PARD3, PRSS12, GABRG1, COBLL1, PXDN
subtype of: Excitatory
>L2_Lamp5
expressed: LAMP5, CUX2, PDGFD, PARD3
not expressed: RORB, GABRG1, COL5A2, PXDN
subtype of: Excitatory
>L3_Prss12
expressed: PRSS12, RORB, COBLL1, CUX2
not expressed: LAMP5, GABRG1, GRIN3A, CMTM8, PXDN, OPRK1, PDGFD, FAT4, PDZD2
subtype of: Excitatory
>L3_Plch1
expressed: PRSS12, RORB, COBLL1, PLCH1
not expressed: LAMP5, GABRG1, GRIN3A, CMTM8, PXDN, OPRK1
subtype of: Excitatory
>L4_Rorb
expressed: RORB, GABRG1, CUX2
not expressed: PRSS12, CMTM8, PXDN, OPRK1, LAMP5
subtype of: Excitatory
>L5_Grin3a
expressed: GRIN3A, TLL1, CMTM8, RORB, TOX
not expressed: HTR2C, CUX2, PXDN, OPRK1, GABRG1
subtype of: Excitatory
>L5_Htr2c
expressed: HTR2C, PARD3, NXPH2, TLE4
not expressed: CMTM8, PXDN, LGR6
subtype of: Excitatory
>L6_Nr4a2
expressed: NR4A2, POSTN, HTR2C
not expressed: PRSS12, KCNIP1, NXPH2, PXDN
subtype of: Excitatory
>L6_Syn3
expressed: PXDN, OPRK1
not expressed: CUX2, RORB, HTR2C, CMTM8
subtype of: Excitatory
>L6_Tle4
expressed: TLE4, LGR6
not expressed: CUX2, RORB, HTR2C
subtype of: Excitatory
## L5_Grin3a
> L5_Grin3a_Fstl4
expressed: FSTL4, PRKG1
not expressed: FAM19A1, NTM, RGS6, SLIT3
subtype of: L5_Grin3a
> L5_Grin3a_Tox
expressed: TOX, DCC
not expressed: FAM19A1, NTM, ROBO2, RGS6, SLIT3
subtype of: L5_Grin3a
> L5_Grin3a_Slit3
expressed: FAM19A1, NTM, ROBO2, RGS6, SLIT3
subtype of: L5_Grin3a
## L6_Tle4
> L6_Tle4_Lsamp
expressed: LSAMP, RYR2
not expressed: CDH10, CNTN4
subtype of: L6_Tle4
> L6_Tle4_Cdh10
expressed: CDH10, CNTN4
not expressed: LSAMP, RYR2
subtype of: L6_Tle4
Cell Types
Cell type hierarchy
(our data)
CellAnnotatoR
Garnett
"Recognized" cells:
85.0%
78.4%
16.6%
4.0%
No "ground truth" here, but we validated our annotation with the corresponding markers.
"Recognized cells" mean fraction of cells, which has at least some label from the corresponding level