Mechthild Lütge
04.12.2023
Joined Bork group at the EMBL Heidelberg as a Masters student
Bacterial pan-genomes:
Joined Immunobiology group at the Kantonsspital St.Gallen as a PhD student
Single cell transcriptomics:
Immune cell niches
Bacterial genomic data from large database freeze
Research experience in applied bioinformatics
>30 projects:
scRNAseq (10X, smartSeq), scVDJ, spatial transcriptomics (Visium), microbiome data (16S), metagenome data,
Mapping pipeline
Idea:
seurat.rds
Download rawdata
Mapping to reference genome
Quality control and filtering
Normalization, dimensionality
reduction, clustering
Data exploration
Makefile:
### sample specifications as specified in config.json
## runName := $(shell python3 config.py runName) ...
### exclude all and clean as targets
.PHONY: all clean debug
### require all sample names to be processed
all: $(sampleName)
debug:
@echo "runName: $(runName)"
@echo "sampleName: $(sampleName)"
@echo "projectPath: $(projectPath)"
@echo "referenceFasta: $(referenceFasta)"
@echo "referenceGtf: $(referenceGtf)"
@echo "referenceDir: $(referenceDir)"
@echo "organism: $(organism)"
@echo "sceScript: $(sceScript)"
$(sampleName): /data/raw/$(runName)/$(sampleName).tar $(referenceDir)/reference.json /data/processed/$(runName)/$(sampleName)_seurat.rds
@echo "Processing Sample: $@"
### Download sample
/data/raw/$(runName)/$(sampleName).tar:
@echo "Downloading Sample: $(sampleName)"
wget -r -nH --cut-dirs=2 --no-parent --reject="index.html*" -e robots=off --user $(user) --password $(password) $(baseURL)$(runName)/$(sampleName).tar \
--directory-prefix /data/raw/
### Create reference
$(referenceDir)/reference.json:
@echo "Creating Reference: $@"
cd /data/reference && $(cellrangerPath)/cellranger mkref --nthreads=32 --genome=$(notdir $(referenceDir)) --fasta=$(notdir $(referenceFasta)) \
--genes=$(notdir $(referenceGtf))
### Extract, map and process sample
/data/processed/$(runName)/$(sampleName)_seurat.rds: /data/raw/$(runName)/$(sampleName).tar
@echo "Extracting Sample: $(sampleName)"
mkdir -p /data/tmp/$(runName)
tar -xf /data/raw/$(runName)/$(sampleName).tar --directory /data/tmp/$(runName)
@echo "Mapping Sample: $(sampleName)"
mkdir -p /data/mapped/$(runName)
cd /data/mapped/$(runName) && $(cellrangerPath)/cellranger count --id=$(sampleName) --fastqs=/data/tmp/$(runName)/$(sampleName) \
--sample=$(sampleName) --nosecondary --transcriptome=$(referenceDir) --localcores=32
rm -rf /data/tmp/$(runName)/$(sampleName)
@echo "Processing Sample: $(sampleName)"
R CMD BATCH "--args /data/mapped/$(runName)/$(sampleName)/outs/filtered_feature_bc_matrix $(organism) $(runName) $(sampleName) \
/data/processed/$(runName)/$(sampleName)_seurat.rds" $(sceScript)
### Clean up generated data
clean:
rm -rf /data/raw/$(runName) /data/tmp/$(runName) /data/mapped/$(runName) /data/processed/$(runName)
Mapping pipeline
Idea:
Cron Job:
0 22 *** (every 24 hours)
Download rawdata
Mapping to reference genome
Quality control and filtering
Normalization, dimensionality
reduction, clustering
Data exploration
Makefile:
seurat.rds
pipeline.py:
fetch_sample.bash:
Genomics Viewer:
Web App to access and edit database with all samples
Downstream analyses - How to decide for a tool?
Considerations:
Project idea:
Localization of perivascular reticular cells in human lymph nodes using spatial transcriptomics and scRNAseq data → tool for celltype decomposition
Perivascular reticular cells
Histological images, biological knowledge about lymph node architecture
Spatial transcriptomics + single cell reference
Select tools to test:
SpaTalk, RCTD (spacexr)
Downstream analyses - How to decide for a tool?
SpaTalk:
Decomposition based on non-negative linear model
RCTD:
Maximum-likelihood estimation to resolve a statistical model that estimates mixtures of cell types at each pixel assuming gene counts to be Poisson distributed
Downstream analyses - How to decide for a tool?
Define evaluation criteria:
Downstream analyses - How to decide for a tool?
Iterative testing and parameter optimization for each tool:
PRC
RCTD
SpaTalk
Downstream analyses - How to decide for a tool?
T cell
B cell
RCTD
SpaTalk
PRC
Downstream analyses - How to decide for a tool?
Thank you for your time!