Data analysis in paleogenetics with EAGER

Alexander Peltzer, June 27th 2017

Motivation

Large numbers of aDNA data created using new sequencing technologies (NGS)
Analysis rather difficult due to:
- Size of datasets
- Complexity of the datasets (sequencing errors, deamination)
- Contamination of sequenced samples, libraries

Size of DATA

(NextSeq 500, HiSeq 2000 are not even listed here anymore!)

Image by Illumina Inc. (c)

Complexity

How do you know what's a variant and what's an error?

Contamination

Contamination estimation is a key component for aDNA projects!

Renaud et al 2015

Motivation

Only few aDNA workflows/pipelines available
- Mostly bash/perl/python scripts, difficult in application
- Tools tailored for older methods: Sanger sequencing applications won't work with NGS data
Paleomix (Schubert et al 2014) one of the few exceptions

EAGER Pipeline overview

EAGER: Focus

Automated workflow
Standard operating procedures for aDNA analysis projects
Usable and installable for non-bioinformatics experts

EAGER Features

RAW read processing, quality assessment of NGS data
Mapping methods (BWA, BWAmem, Bowtie2, Stampy)
Authentication (mapDamage, DamageProfiler)
Variant Calling & Filtering (angsd, GATK, ...)
Graphical user interface!

EAGER GUI

EAGER Features

Multiple sample mode: Execute same settings on multiple files
ReportTable: Provide reports of analysis runs
Statistics: Quality control, SNP Calling statistics, mapping results

Fail safe

Logfiles with errors, caveats
Tracked versions of tools (and EAGER)
Reproducible - you can check in your log file what happened!
Restart on error - if something breaks, restart it - EAGER picks up where it left!

# EAGER Version used for this run: 1.92.21
################
#CreateResultsDirectories at 2017-05-19T18:26:11.228 was executed with the following commandline:
mkdir -p /home/peltzer/palshare/peltzer/2017-05-18_Thesis_Runs_Peltzer/EAGER_Evaluation/Mummies_WGS_Screening/Sample
_JK2968/0-FastQC/.tmp /home/peltzer/palshare/peltzer/2017-05-18_Thesis_Runs_Peltzer/EAGER_Evaluation/Mummies_WGS_Scr
eening/Sample_JK2968/1-AdapClip/.tmp /home/peltzer/palshare/peltzer/2017-05-18_Thesis_Runs_Peltzer/EAGER_Evaluation/
Mummies_WGS_Screening/Sample_JK2968/3-Mapper/.tmp /home/peltzer/palshare/peltzer/2017-05-18_Thesis_Runs_Peltzer/EAGE
R_Evaluation/Mummies_WGS_Screening/Sample_JK2968/4-Samtools/.tmp /home/peltzer/palshare/peltzer/2017-05-18_Thesis_Ru
ns_Peltzer/EAGER_Evaluation/Mummies_WGS_Screening/Sample_JK2968/5-DeDup/.tmp /home/peltzer/palshare/peltzer/2017-05-
18_Thesis_Runs_Peltzer/EAGER_Evaluation/Mummies_WGS_Screening/Sample_JK2968/6-QualiMap/.tmp /home/peltzer/palshare/p
eltzer/2017-05-18_Thesis_Runs_Peltzer/EAGER_Evaluation/Mummies_WGS_Screening/Sample_JK2968/7-DnaDamage/.tmp /home/pe
ltzer/palshare/peltzer/2017-05-18_Thesis_Runs_Peltzer/EAGER_Evaluation/Mummies_WGS_Screening/Sample_JK2968/8-Preseq/
.tmp
################
## Runtime of Module was: 0 seconds.
################
#FastQCdefault at 2017-05-19T18:26:11.29 was executed with the following commandline:
fastqc -o /home/peltzer/palshare/peltzer/2017-05-18_Thesis_Runs_Peltzer/EAGER_Evaluation/Mummies_WGS_Screening/Sampl
e_JK2968/0-FastQC --extract  -f fastq /home/peltzer/palshare/peltzer/Mummies/RAW/2015-05-22_SequencingRun462/Sample_
JK2968/JK2968_TGAAGGTCAGCAGA_L001_R1_001.fastq.gz /home/peltzer/palshare/peltzer/Mummies/RAW/2015-05-22_SequencingRu
n462/Sample_JK2968/JK2968_TGAAGGTCAGCAGA_L001_R2_001.fastq.gz
################
#Picked up JAVA_TOOL_OPTIONS: -Djava.io.tmpdir=/home/peltzer/palshare/peltzer/2017-05-18_Thesis_Runs_Peltzer/EAGER_E
valuation/Mummies_WGS_Screening/Sample_JK2968/0-FastQC/.tmp
Skipping '' which didn't exist, or couldn't be read
Started analysis of JK2968_TGAAGGTCAGCAGA_L001_R1_001.fastq.gz
Approx 5% complete for JK2968_TGAAGGTCAGCAGA_L001_R1_001.fastq.gz
Approx 10% complete for JK2968_TGAAGGTCAGCAGA_L001_R1_001.fastq.gz
Approx 15% complete for JK2968_TGAAGGTCAGCAGA_L001_R1_001.fastq.gz
Approx 20% complete for JK2968_TGAAGGTCAGCAGA_L001_R1_001.fastq.gz
Approx 25% complete for JK2968_TGAAGGTCAGCAGA_L001_R1_001.fastq.gz