Baysor: segmentation of spatial transcriptomics data

Viktor Petukhov , Peter Kharchenko

1,2

2,3

University of Copenhagen, BRIC
Harvard Medical School, DBMI
Harvard Stem Cell Institute

https://bit.ly/2WRTWg5

Problem

How to map expression to space?

*https://www.biosearchtech.com/cancer-RNA-FISH

Spatial gene expression patterns

pciSeq

MERFISH

1. X. Qian, K.D. Harris, T. Hauling, D. Nicoloutsopoulos, A.M. Manchado, N. Skene, J. Hjerling-Leffler, M. Nilsson, bioRxiv 2018, 431957276097

2. Moffitt, J. R. et al. Molecular, spatial and functional single-cell profiling of the hypothalamic preoptic region. Science 362 (2018)

[1]

[2]

MERFISH
ISS
osmFISH
smFISH
BaristaSeq

Up to 10 cells
Up to 10 transcripts
Up to 10000 genes

Spatial protocols based on RNA-FISH or in situ sequencint

DARTFISH
ex-FISH
StarMAP
seq-FISH
FISSEQ

Protocols

Data

Segmentation problems

Molecules

DAPI

*Moffitt, J. R. et al. Molecular, spatial and functional single-cell profiling of the hypothalamic preoptic region. Science 362 (2018)

Segmentation problems

*Moffitt, J. R. et al. Molecular, spatial and functional single-cell profiling of the hypothalamic preoptic region. Science 362 (2018)

Segmentation problems

MERFISH

[1]

osm-FISH

[2]

1. Moffitt, J. R. et al. Molecular, spatial and functional single-cell profiling of the hypothalamic preoptic region. Science 362 (2018)

2. Simone Codeluppi, Lars E. Borm, Amit Zeisel, Gioele La Manno, Josina A. van Lunteren, Camilla I. Svensson & Sten Linnarsson. Nature Methods 15, 932–935 (2018)

Methods

Preparation: local gene expression

Gene coloring

Local expression coloring

k nearest neighbors

Gene 1	...	Gene K
N1	...	N_K

Local expression vector LE

Embed to 3d CIELAB colorspace

Baysor: segmentation of spatial transcriptomics data

X	Y	Gene
...	...	...

Expected

cell size

Transcript data

DAPI

Poly-A

staining

Optional

Algorithm: toy example

Gene 1: 20%

Gene 2: 80%

Gene 1: 80%

Gene 2: 20%

What's the source?

X	Y	Gene
...	...	...

We know

Baysor model

Cell as a distribution

Non-conjugate, but has good parametrization

(mean and std instead of #degrees of freedom for Inverse Gamma)

Doesn't work yet

Baysor model

Molecules as a random field

Points are molecules
Point colors are transcript types (i.e. genes)
Lines are the random field edges
Background colors are some cell segmentation

Triangulation

w_{u,v} = p(cell(m_u) = cell(m_v) | m_u, m_v) \sim \\ \sim \frac{cor(LE_u, LE_v)}{\sqrt{(x_u - x_v)^2 + (y_u - y_v)^2}}

edge weight

local expression

Baysor: Dirichlet Mixture Models

Initialize cells from some clustering (K-Shift is used)
Expect probabilities of molecules to belong to the cells (E-step)
Scholastically assign molecules to the cells (S-step)
Maximize parameters of the cell distributions (M-step)
Optionally: update priors
Sample new cells from Dirichlet prior (key difference from SEM algorithm)
Go to 2

Baysor: one step example

f_c(m_t) = \#molecules(c) * \\ N(x_t, y_t | \mu_c, \Sigma_c) * Cat(g_t | G_c)

p(m_t \in c) = \frac{\Sigma_{(c \in adj(t) : cell(v)=c)} w_{v,t} f_c}{\Sigma_{(c \in adj(t))} w_{v,t} f_{cell(u)}}

E-step:

Distribution for S-step:

Baysor: Alorithm demonstration

Results

Problem: validation of segmentation

Number of cells
Fraction of molecules, assigned to noise
Number of doublets based on expression markers
Contamination level based on segmentation-free cell type assignment
Detailed comparison with manual segmentation of Allen smFISH

*3 and 4 are probably the same

Segmentation results

Local gene composition

Cell type

Segmentation results

Protocol	Baysor	Staining
osm-FISH	10059	4572
Allen sm-FISH	4435	2525
MERFISH (subset)	9279	6119
pciSeq	2547	3413

Number of segmented cells

%of assigned molecules

Protocol	Baysor	Staining
osm-FISH	87.4	44.1
Allen sm-FISH	79.6	61.6
MERFISH (subset)	75.5	47.4
pciSeq	25.7	25.8

Reducing expression contamination

[1]

1. Simone Codeluppi, Lars E. Borm, Amit Zeisel, Gioele La Manno, Josina A. van Lunteren, Camilla I. Svensson & Sten Linnarsson. Nature Methods 15, 932–935 (2018)

What is contamination?

Low expression / false positive

Contamination

Solution: segmentation-free type assignment

Local gene composition

Cell type

Step 1: extract markers

osmFISH paper annotation

>Inhibitory
expressed: Gad2, Pthlh, Crh
not expressed: Tbr1, Rorb, Mfge8, Cpne5

>Excitatory
expressed: Tbr1, Lamp5, Rorb, Syt6
not expressed: Mfge8, Gad2, Mrc1

>Astrocytes
expressed: Aldoc, Gfap, Serpinf1, Mfge8
not expressed: Hexb, Lamp5, Mrc1, Gad2, Sox10, Rorb, Tbr1, Syt6, Plp1

>Oligodendrocytes
expressed: Sox10, Plp1, Pdgfra, Tmem6, Itpr2, Ctps, Bmp4, Anln
not expressed: Hexb, Mrc1, Aldoc, Gfap, Gad2, Tbr1

>Microglia
expressed: Hexb
not expressed: Gad2, Tbr1, Gfap, Mfge8

>Macrophages
expressed: Mrc1
not expressed: Rorb, Lamp5, Syt6, Cpne5, Gfap, Mfge8, Plp1

>Vasculature
expressed: Flt1, Apln, Vtn, Acta2
not expressed: Lamp5, Rorb, Sox10, Gad2, Syt6, Crh

>Ventricle
expressed: Ttr, Foxj1
not expressed: Gad2, Cpne5

>Hippocampus
expressed: Kcnip
not expressed: Gad2, Tbr1, Lamp5, Rorb, Slc32a1

## Inhibitory

>Inh Crhbp
expressed: Crhbp
subtype of: Inhibitory

>Inh Cnr1
expressed: Cnr1
subtype of: Inhibitory

>Inh Kcnip
expressed: Kcnip
subtype of: Inhibitory

>Inh Pthlh
expressed: Pthlh
subtype of: Inhibitory

>Inh Vip
expressed: Vip
subtype of: Inhibitory

>Inh Crh
expressed: Crh
not expressed: Vip
subtype of: Inhibitory

## Vasculature

>Vasc Flt1
expressed: Flt1
subtype of: Vasculature

>Vasc Vtn
expressed: Vtn
subtype of: Vasculature

>Vasc Apln
expressed: Apln
subtype of: Vasculature

>Vasc Acta2
expressed: Acta2
subtype of: Vasculature

## Excitatory

>Ex Rorb
expressed: Rorb
subtype of: Excitatory

>Ex Syt6
expressed: Syt6
subtype of: Excitatory

>Ex Tbr1
expressed: Tbr1
not expressed: Syt6, Rorb
subtype of: Excitatory

>Ex Lamp5
expressed: Lamp5
not expressed: Syt6, Rorb
subtype of: Excitatory

## Oligodendrocytes

>Oligo Cop
expressed: Bmp4
subtype of: Oligodendrocytes

>Oligo MF
expressed: Ctps
subtype of: Oligodendrocytes

>Oligo NF
expressed: Itpr2
subtype of: Oligodendrocytes

>Oligo Precursors
expressed: Pdgfra
subtype of: Oligodendrocytes

>Oligo Mature
expressed: Plp1, Anln
not expressed: Itpr2, Ctps, Bmp4
subtype of: Oligodendrocytes

## Ventricle

>Ependymal
expressed: Foxj1
subtype of: Ventricle

>C. Plexus
expressed: Ttr
subtype of: Ventricle

## Astrocytes

>Astro Mfge8
expressed: Mfge8
subtype of: Astrocytes

>Astro Gfap
expressed: Gfap
not expressed: Mfge8
subtype of: Astrocytes

Extracted markers

Step 1: extract markers

osmFISH paper annotation

Pagoda embedding, same annotation

Step 1: extract markers

New annotation, level 1

New annotation, level 2

Step 1: extract markers

New annotation, level 1

New annotation, level 2

Step 2: extract local vectors

Problems:

```
1976659 pseudo-cells
```
Expression is very sparse (10 reads per cell)

Result:

```
No graph
```
No embeddings

Step 2: extract local vectors

New annotation, level 1

New annotation, level 2

Step 3: estimate fraction of the most represented type per cell

Validation of the approach

Paper

Baysor

Validation of the approach

Cell type

Max. fraction

Validation: zoom in

Validation: add polyT

Paper

Baysor

Validation: summary

We want to improve this plot

Next steps

Cell type expression prior

Idea:

Aggregate expression over similar cells

Problems:

NNs depends on distance. No way to find good one a-priori
Contamination has its own patterns and similar cells are simply contaminated in the same manner

Example cell

Nearest

neighbors

Cell sampling prior

"Contamination" regions are too dense to be noise, and probability to form a new cell is too small

Cell sampling prior

"Contamination" regions are too dense to be noise, and probability to form a new cell is too small

Cell sampling prior

Initialize cells from some clustering
Expect probabilities of molecules to belong to the cells (E-step)
Scholastically assign molecules to the cells (S-step)
Maximize parameters of the cell distributions (M-step)
Optionally: update priors
Sample new cells from Dirichlet prior
Go to 2

Split-merge algorithm

Chinese restaurant processes

Segmentation-free DAPI processing

w_{u,v} = p(cell(m_u) = cell(m_v) | m_u, m_v) \sim \\ \sim \frac{cor(LE_u, LE_v)}{\sqrt{(x_u - x_v)^2 + (y_u - y_v)^2}}

w_{u,v} = p(cell(m_u) = cell(m_v) | m_u, m_v) \sim \\ \sim \frac{cor(LE_u, LE_v)}{\sqrt{(x_u - x_v)^2 + (y_u - y_v)^2}} * F(brightness)

Transcript info

Staining info

Segmentation of "bulk" data

Slide-Seq: 10μm beads

500μm

HDST: 2μm wells

Rodriques S.G., et. al. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution
Vickovic S., et. al. High-definition spatial transcriptomics for in situ tissue profiling

Baysor, PM 06 Nov 2019

By Viktor Petukhov

Baysor, PM 06 Nov 2019

PKLab progress meeting presentation

1,245

Viktor Petukhov

PhD student at the University of Copenhagen

github.com/VPetukhov

Baysor: segmentation of spatial transcriptomics data

Problem

How to map expression to space?

Spatial gene expression patterns

Spatial protocols based on RNA-FISH or in situ sequencint

Protocols

Data

Segmentation problems

Segmentation problems

Segmentation problems

Methods

Preparation: local gene expression

Baysor: segmentation of spatial transcriptomics data

Algorithm: toy example

Baysor model

Baysor model

Baysor: Dirichlet Mixture Models

Baysor: one step example

Baysor: Alorithm demonstration

Results

Problem: validation of segmentation

Segmentation results

Segmentation results

Reducing expression contamination

What is contamination?

Solution: segmentation-free type assignment

Step 1: extract markers

Step 1: extract markers

Step 1: extract markers

Step 1: extract markers

Step 2: extract local vectors

Step 2: extract local vectors

Step 3: estimate fraction of the most represented type per cell

Validation of the approach

Validation of the approach

Validation: zoom in

Validation: add polyT

Validation: summary

Next steps

Cell type expression prior

Cell sampling prior

Cell sampling prior

Cell sampling prior

Segmentation-free DAPI processing

Segmentation of "bulk" data

Baysor, PM 06 Nov 2019

More from Viktor Petukhov