Diogo N Silva
Writing of pipelines in python/perl/shell scripts circa 2000, colorized.
The game changing combination of workflow managers + containers:
>3000 bioinformatics tools developed in 2018
Pipelines need to be disposable
Pipelines, like tools, are temporary
#!/usr/bin/env nextflow
/*
========================================================================================
nf-core/chipseq
========================================================================================
nf-core/chipseq Analysis Pipeline.
#### Homepage / Documentation
https://github.com/nf-core/chipseq
----------------------------------------------------------------------------------------
*/
def helpMessage() {
log.info nfcoreHeader()
log.info"""
Usage:
The typical command for running the pipeline is as follows:
nextflow run nf-core/chipseq --design design.csv --genome GRCh37 -profile docker
Mandatory arguments:
--design Comma-separated file containing information about the samples in the experiment (see docs/usage.md)
--fasta Path to Fasta reference. Not mandatory when using reference in iGenomes config via --genome
--gtf Path to GTF file in Ensembl format. Not mandatory when using reference in iGenomes config via --genome
-profile Configuration profile to use. Can use multiple (comma separated)
Available: conda, docker, singularity, awsbatch, test
Generic
--singleEnd Specifies that the input is single-end reads
--seq_center Sequencing center information to be added to read group of BAM files
--fragment_size [int] Estimated fragment size used to extend single-end reads (Default: 200)
--fingerprint_bins [int] Number of genomic bins to use when calculating fingerprint plot (Default: 500000)
References If not specified in the configuration file or you wish to overwrite any of the references
--genome Name of iGenomes reference
--bwa_index Full path to directory containing BWA index including base name i.e. /path/to/index/genome.fa
--gene_bed Path to BED file containing gene intervals
--tss_bed Path to BED file containing transcription start sites
--macs_gsize Effective genome size parameter required by MACS2. If using iGenomes config, values have only been provided when --genome is set as GRCh37, GRCm38, hg19, mm10, BDGP6 and WBcel235
--blacklist Path to blacklist regions (.BED format), used for filtering alignments
--saveGenomeIndex If generated by the pipeline save the BWA index in the results directory
Trimming
--clip_r1 [int] Instructs Trim Galore to remove bp from the 5' end of read 1 (or single-end reads) (Default: 0)
--clip_r2 [int] Instructs Trim Galore to remove bp from the 5' end of read 2 (paired-end reads only) (Default: 0)
--three_prime_clip_r1 [int] Instructs Trim Galore to remove bp from the 3' end of read 1 AFTER adapter/quality trimming has been performed (Default: 0)
--three_prime_clip_r2 [int] Instructs Trim Galore to re move bp from the 3' end of read 2 AFTER adapter/quality trimming has been performed (Default: 0)
--skipTrimming Skip the adapter trimming step
--saveTrimmed Save the trimmed FastQ files in the the results directory
Alignments
--keepDups Duplicate reads are not filtered from alignments
--keepMultiMap Reads mapping to multiple locations are not filtered from alignments
--saveAlignedIntermediates Save the intermediate BAM files from the alignment step - not done by default
Peaks
--narrowPeak Run MACS2 in narrowPeak mode
--broad_cutoff [float] Specifies broad cutoff value for MACS2. Only used when --narrowPeak isnt specified (Default: 0.1)
--min_reps_consensus Number of biological replicates required from a given condition for a peak to contribute to a consensus peak (Default: 1)
--saveMACSPileup Instruct MACS2 to create bedGraph files normalised to signal per million reads
--skipDiffAnalysis Skip differential binding analysis
QC
--skipFastQC Skip FastQC
--skipPicardMetrics Skip Picard CollectMultipleMetrics
--skipPreseq Skip Preseq
--skipPlotProfile Skip deepTools plotProfile
--skipPlotFingerprint Skip deepTools plotFingerprint
--skipSpp Skip Phantompeakqualtools
--skipIGV Skip IGV
--skipMultiQC Skip MultiQC
--skipMultiQCStats Exclude general statistics table from MultiQC report
Other
--outdir The output directory where the results will be saved
--email Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits
--maxMultiqcEmailFileSize Theshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: 25MB)
-name Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic
AWSBatch
--awsqueue The AWSBatch JobQueue that needs to be set when running on AWSBatch
--awsregion The AWS Region for your AWS Batch job to run on
""".stripIndent()
}
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/* -- -- */
/* -- SET UP CONFIGURATION VARIABLES -- */
/* -- -- */
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/*
* SET UP CONFIGURATION VARIABLES
*/
// Show help message
if (params.help){
helpMessage()
exit 0
}
// Check if genome exists in the config file
if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) {
exit 1, "The provided genome '${params.genome}' is not available in the iGenomes file. Currently the available genomes are ${params.genomes.keySet().join(", ")}"
}
////////////////////////////////////////////////////
/* -- DEFAULT PARAMETER VALUES -- */
////////////////////////////////////////////////////
// Configurable variables
params.fasta = params.genome ? params.genomes[ params.genome ].fasta ?: false : false
params.bwa_index = params.genome ? params.genomes[ params.genome ].bwa ?: false : false
params.gtf = params.genome ? params.genomes[ params.genome ].gtf ?: false : false
params.gene_bed = params.genome ? params.genomes[ params.genome ].gene_bed ?: false : false
params.macs_gsize = params.genome ? params.genomes[ params.genome ].macs_gsize ?: false : false
params.blacklist = params.genome ? params.genomes[ params.genome ].blacklist ?: false : false
// Has the run name been specified by the user?
// this has the bonus effect of catching both -name and --name
custom_runName = params.name
if (!(workflow.runName ==~ /[a-z]+_[a-z]+/)){
custom_runName = workflow.runName
}
////////////////////////////////////////////////////
/* -- CONFIG FILES -- */
////////////////////////////////////////////////////
// Pipeline config
ch_output_docs = file("$baseDir/docs/output.md", checkIfExists: true)
// JSON files required by BAMTools for alignment filtering
if (params.singleEnd) {
ch_bamtools_filter_config = file(params.bamtools_filter_se_config, checkIfExists: true)
} else {
ch_bamtools_filter_config = file(params.bamtools_filter_pe_config, checkIfExists: true)
}
// Header files for MultiQC
ch_multiqc_config = file(params.multiqc_config, checkIfExists: true)
ch_peak_count_header = file("$baseDir/assets/multiqc/peak_count_header.txt", checkIfExists: true)
ch_frip_score_header = file("$baseDir/assets/multiqc/frip_score_header.txt", checkIfExists: true)
ch_peak_annotation_header = file("$baseDir/assets/multiqc/peak_annotation_header.txt", checkIfExists: true)
ch_deseq2_pca_header = file("$baseDir/assets/multiqc/deseq2_pca_header.txt", checkIfExists: true)
ch_deseq2_clustering_header = file("$baseDir/assets/multiqc/deseq2_clustering_header.txt", checkIfExists: true)
ch_spp_correlation_header = file("$baseDir/assets/multiqc/spp_correlation_header.txt", checkIfExists: true)
ch_spp_nsc_header = file("$baseDir/assets/multiqc/spp_nsc_header.txt", checkIfExists: true)
ch_spp_rsc_header = file("$baseDir/assets/multiqc/spp_rsc_header.txt", checkIfExists: true)
////////////////////////////////////////////////////
/* -- VALIDATE INPUTS -- */
////////////////////////////////////////////////////
// Validate inputs
if (params.design) { ch_design = file(params.design, checkIfExists: true) } else { exit 1, "Samples design file not specified!" }
if (params.gtf) { ch_gtf = file(params.gtf, checkIfExists: true) } else { exit 1, "GTF annotation file not specified!" }
if (params.gene_bed) { ch_gene_bed = file(params.gene.bed, checkIfExists: true) }
if (params.tss_bed) { ch_tss_bed = file(params.tss_bed, checkIfExists: true) }
if (params.blacklist) { ch_blacklist = file(params.blacklist, checkIfExists: true) }
if (params.fasta){
lastPath = params.fasta.lastIndexOf(File.separator)
bwa_base = params.fasta.substring(lastPath+1)
ch_fasta = file(params.fasta, checkIfExists: true)
} else {
exit 1, "Fasta file not specified!"
}
if (params.bwa_index){
lastPath = params.bwa_index.lastIndexOf(File.separator)
bwa_dir = params.bwa_index.substring(0,lastPath+1)
bwa_base = params.bwa_index.substring(lastPath+1)
ch_bwa_index = Channel
.fromPath(bwa_dir, checkIfExists: true)
.ifEmpty { exit 1, "BWA index directory not found: ${bwa_dir}" }
}
////////////////////////////////////////////////////
/* -- AWS -- */
////////////////////////////////////////////////////
if( workflow.profile == 'awsbatch') {
// AWSBatch sanity checking
if (!params.awsqueue || !params.awsregion) exit 1, "Specify correct --awsqueue and --awsregion parameters on AWSBatch!"
// Check outdir paths to be S3 buckets if running on AWSBatch
// related: https://github.com/nextflow-io/nextflow/issues/813
if (!params.outdir.startsWith('s3:')) exit 1, "Outdir not on S3 - specify S3 Bucket to run on AWSBatch!"
// Prevent trace files to be stored on S3 since S3 does not support rolling files.
if (workflow.tracedir.startsWith('s3:')) exit 1, "Specify a local tracedir or run without trace! S3 cannot be used for tracefiles."
}
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/* -- -- */
/* -- HEADER LOG INFO -- */
/* -- -- */
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
// Header log info
log.info nfcoreHeader()
def summary = [:]
summary['Run Name'] = custom_runName ?: workflow.runName
summary['Data Type'] = params.singleEnd ? 'Single-End' : 'Paired-End'
summary['Design File'] = params.design
summary['Genome'] = params.genome ?: 'Not supplied'
summary['Fasta File'] = params.fasta
summary['GTF File'] = params.gtf
if (params.gene_bed) summary['Gene BED File'] = params.gene_bed
if (params.tss_bed) summary['TSS BED File'] = params.tss_bed
if (params.bwa_index) summary['BWA Index'] = params.bwa_index
if (params.blacklist) summary['Blacklist BED'] = params.blacklist
summary['MACS2 Genome Size'] = params.macs_gsize ?: 'Not supplied'
summary['Min Consensus Reps'] = params.min_reps_consensus
if (params.macs_gsize) summary['MACS2 Narrow Peaks'] = params.narrowPeak ? 'Yes' : 'No'
if (!params.narrowPeak) summary['MACS2 Broad Cutoff'] = params.broad_cutoff
if (params.skipTrimming){
summary['Trimming Step'] = 'Skipped'
} else {
summary['Trim R1'] = "$params.clip_r1 bp"
summary['Trim R2'] = "$params.clip_r2 bp"
summary["Trim 3' R1"] = "$params.three_prime_clip_r1 bp"
summary["Trim 3' R2"] = "$params.three_prime_clip_r2 bp"
}
if (params.seq_center) summary['Sequencing Center'] = params.seq_center
if (params.singleEnd) summary['Fragment Size'] = "$params.fragment_size bp"
summary['Fingerprint Bins'] = params.fingerprint_bins
if (params.keepDups) summary['Keep Duplicates'] = 'Yes'
if (params.keepMultiMap) summary['Keep Multi-mapped'] = 'Yes'
summary['Save Genome Index'] = params.saveGenomeIndex ? 'Yes' : 'No'
if (params.saveTrimmed) summary['Save Trimmed'] = 'Yes'
if (params.saveAlignedIntermediates) summary['Save Intermeds'] = 'Yes'
if (params.saveMACSPileup) summary['Save MACS2 Pileup'] = 'Yes'
if (params.skipDiffAnalysis) summary['Skip Diff Analysis'] = 'Yes'
if (params.skipFastQC) summary['Skip FastQC'] = 'Yes'
if (params.skipPicardMetrics) summary['Skip Picard Metrics'] = 'Yes'
if (params.skipPreseq) summary['Skip Preseq'] = 'Yes'
if (params.skipPlotProfile) summary['Skip plotProfile'] = 'Yes'
if (params.skipPlotFingerprint) summary['Skip plotFingerprint'] = 'Yes'
if (params.skipSpp) summary['Skip spp'] = 'Yes'
if (params.skipIGV) summary['Skip IGV'] = 'Yes'
if (params.skipMultiQC) summary['Skip MultiQC'] = 'Yes'
if (params.skipMultiQCStats) summary['Skip MultiQC Stats'] = 'Yes'
summary['Max Resources'] = "$params.max_memory memory, $params.max_cpus cpus, $params.max_time time per job"
if(workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container"
summary['Output Dir'] = params.outdir
summary['Launch Dir'] = workflow.launchDir
summary['Working Dir'] = workflow.workDir
summary['Script Dir'] = workflow.projectDir
summary['User'] = workflow.userName
if (workflow.profile == 'awsbatch'){
summary['AWS Region'] = params.awsregion
summary['AWS Queue'] = params.awsqueue
}
summary['Config Profile'] = workflow.profile
if (params.config_profile_description) summary['Config Description'] = params.config_profile_description
if (params.config_profile_contact) summary['Config Contact'] = params.config_profile_contact
if (params.config_profile_url) summary['Config URL'] = params.config_profile_url
if(params.email) {
summary['E-mail Address'] = params.email
summary['MultiQC Max Size'] = params.maxMultiqcEmailFileSize
}
log.info summary.collect { k,v -> "${k.padRight(20)}: $v" }.join("\n")
log.info "\033[2m----------------------------------------------------\033[0m"
// Check the hostnames against configured profiles
checkHostname()
// Show a big warning message if we're not running MACS
if (!params.macs_gsize){
def warnstring = params.genome ? "supported for '${params.genome}'" : 'supplied'
log.warn "=================================================================\n" +
" WARNING! MACS genome size parameter not $warnstring.\n" +
" Peak calling, annotation and differential analysis will be skipped.\n" +
" Please specify value for '--macs_gsize' to run these steps.\n" +
"======================================================================="
}
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/* -- -- */
/* -- PARSE DESIGN FILE -- */
/* -- -- */
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/*
* PREPROCESSING - REFORMAT DESIGN FILE, CHECK VALIDITY & CREATE IP vs CONTROL MAPPINGS
*/
process checkDesign {
tag "$design"
publishDir "${params.outdir}/pipeline_info", mode: 'copy'
input:
file design from ch_design
output:
file "design_reads.csv" into ch_design_reads_csv
file "design_controls.csv" into ch_design_controls_csv
script: // This script is bundled with the pipeline, in nf-core/chipseq/bin/
"""
check_design.py $design design_reads.csv design_controls.csv
"""
}
/*
* Create channels for input fastq files
*/
if (params.singleEnd) {
ch_design_reads_csv.splitCsv(header:true, sep:',')
.map { row -> [ row.sample_id, [ file(row.fastq_1, checkIfExists: true) ] ] }
.into { ch_raw_reads_fastqc;
ch_raw_reads_trimgalore }
} else {
ch_design_reads_csv.splitCsv(header:true, sep:',')
.map { row -> [ row.sample_id, [ file(row.fastq_1, checkIfExists: true), file(row.fastq_2, checkIfExists: true) ] ] }
.into { ch_raw_reads_fastqc;
ch_raw_reads_trimgalore }
}
/*
* Create a channel with [sample_id, control id, antibody, replicatesExist, multipleGroups]
*/
ch_design_controls_csv.splitCsv(header:true, sep:',')
.map { row -> [ row.sample_id, row.control_id, row.antibody, row.replicatesExist.toBoolean(), row.multipleGroups.toBoolean() ] }
.set { ch_design_controls_csv }
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/* -- -- */
/* -- PREPARE ANNOTATION FILES -- */
/* -- -- */
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/*
* PREPROCESSING - Build BWA index
*/
if (!params.bwa_index){
process makeBWAindex {
tag "$fasta"
label 'process_high'
publishDir path: { params.saveGenomeIndex ? "${params.outdir}/reference_genome" : params.outdir },
saveAs: { params.saveGenomeIndex ? it : null }, mode: 'copy'
input:
file fasta from ch_fasta
output:
file "BWAIndex" into ch_bwa_index
script:
"""
bwa index -a bwtsw $fasta
mkdir BWAIndex && mv ${fasta}* BWAIndex
"""
}
}
/*
* PREPROCESSING - Generate gene BED file
*/
if (!params.gene_bed){
process makeGeneBED {
tag "$gtf"
label 'process_low'
publishDir "${params.outdir}/reference_genome", mode: 'copy'
input:
file gtf from ch_gtf
output:
file "*.bed" into ch_gene_bed
script: // This script is bundled with the pipeline, in nf-core/chipseq/bin/
"""
gtf2bed $gtf > ${gtf.baseName}.bed
"""
}
}
/*
* PREPROCESSING - Generate TSS BED file
*/
if (!params.tss_bed){
process makeTSSBED {
tag "$bed"
publishDir "${params.outdir}/reference_genome", mode: 'copy'
input:
file bed from ch_gene_bed
output:
file "*.bed" into ch_tss_bed
script:
"""
cat $bed | awk -v FS='\t' -v OFS='\t' '{ if(\$6=="+") \$3=\$2+1; else \$2=\$3-1; print \$1, \$2, \$3, \$4, \$5, \$6;}' > ${bed.baseName}.tss.bed
"""
}
}
/*
* PREPROCESSING - Prepare genome intervals for filtering
*/
process makeGenomeFilter {
tag "$fasta"
publishDir "${params.outdir}/reference_genome", mode: 'copy'
input:
file fasta from ch_fasta
output:
file "$fasta" into ch_genome_fasta // FASTA FILE FOR IGV
file "*.fai" into ch_genome_fai // FAI INDEX FOR REFERENCE GENOME
file "*.bed" into ch_genome_filter_regions // BED FILE WITHOUT BLACKLIST REGIONS
file "*.sizes" into ch_genome_sizes_bigwig // CHROMOSOME SIZES FILE FOR BEDTOOLS
script:
blacklist_filter = params.blacklist ? "sortBed -i ${params.blacklist} -g ${fasta}.sizes | complementBed -i stdin -g ${fasta}.sizes" : "awk '{print \$1, '0' , \$2}' OFS='\t' ${fasta}.sizes"
"""
samtools faidx $fasta
cut -f 1,2 ${fasta}.fai > ${fasta}.sizes
$blacklist_filter > ${fasta}.include_regions.bed
"""
}
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/* -- -- */
/* -- FASTQ QC -- */
/* -- -- */
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/*
* STEP 1 - FastQC
*/
process fastqc {
tag "$name"
label 'process_medium'
publishDir "${params.outdir}/fastqc", mode: 'copy',
saveAs: {filename -> filename.endsWith(".zip") ? "zips/$filename" : "$filename"}
when:
!params.skipFastQC
input:
set val(name), file(reads) from ch_raw_reads_fastqc
output:
file "*.{zip,html}" into ch_fastqc_reports_mqc
script:
// Added soft-links to original fastqs for consistent naming in MultiQC
if (params.singleEnd) {
"""
[ ! -f ${name}.fastq.gz ] && ln -s $reads ${name}.fastq.gz
fastqc -q ${name}.fastq.gz
"""
} else {
"""
[ ! -f ${name}_1.fastq.gz ] && ln -s ${reads[0]} ${name}_1.fastq.gz
[ ! -f ${name}_2.fastq.gz ] && ln -s ${reads[1]} ${name}_2.fastq.gz
fastqc -q ${name}_1.fastq.gz
fastqc -q ${name}_2.fastq.gz
"""
}
}
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/* -- -- */
/* -- ADAPTER TRIMMING -- */
/* -- -- */
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/*
* STEP 2 - Trim Galore!
*/
if (params.skipTrimming){
ch_trimmed_reads = ch_raw_reads_trimgalore
ch_trimgalore_results_mqc = []
ch_trimgalore_fastqc_reports_mqc = []
} else {
process trimGalore {
tag "$name"
label 'process_long'
publishDir "${params.outdir}/trim_galore", mode: 'copy',
saveAs: {filename ->
if (filename.endsWith(".html")) "fastqc/$filename"
else if (filename.endsWith(".zip")) "fastqc/zip/$filename"
else if (filename.endsWith("trimming_report.txt")) "logs/$filename"
else params.saveTrimmed ? filename : null
}
input:
set val(name), file(reads) from ch_raw_reads_trimgalore
output:
set val(name), file("*.fq.gz") into ch_trimmed_reads
file "*.txt" into ch_trimgalore_results_mqc
file "*.{zip,html}" into ch_trimgalore_fastqc_reports_mqc
script:
// Added soft-links to original fastqs for consistent naming in MultiQC
c_r1 = params.clip_r1 > 0 ? "--clip_r1 ${params.clip_r1}" : ''
c_r2 = params.clip_r2 > 0 ? "--clip_r2 ${params.clip_r2}" : ''
tpc_r1 = params.three_prime_clip_r1 > 0 ? "--three_prime_clip_r1 ${params.three_prime_clip_r1}" : ''
tpc_r2 = params.three_prime_clip_r2 > 0 ? "--three_prime_clip_r2 ${params.three_prime_clip_r2}" : ''
if (params.singleEnd) {
"""
[ ! -f ${name}.fastq.gz ] && ln -s $reads ${name}.fastq.gz
trim_galore --fastqc --gzip $c_r1 $tpc_r1 ${name}.fastq.gz
"""
} else {
"""
[ ! -f ${name}_1.fastq.gz ] && ln -s ${reads[0]} ${name}_1.fastq.gz
[ ! -f ${name}_2.fastq.gz ] && ln -s ${reads[1]} ${name}_2.fastq.gz
trim_galore --paired --fastqc --gzip $c_r1 $c_r2 $tpc_r1 $tpc_r2 ${name}_1.fastq.gz ${name}_2.fastq.gz
"""
}
}
}
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/* -- -- */
/* -- ALIGN -- */
/* -- -- */
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/*
* STEP 3.1 - Align read 1 with bwa
*/
process bwaMEM {
tag "$name"
label 'process_high'
input:
set val(name), file(reads) from ch_trimmed_reads
file index from ch_bwa_index.collect()
output:
set val(name), file("*.bam") into ch_bwa_bam
script:
prefix="${name}.Lb"
if (!params.seq_center) {
rg="\'@RG\\tID:${name}\\tSM:${name.split('_')[0..-2].join('_')}\\tPL:ILLUMINA\\tLB:${name}\\tPU:1\'"
} else {
rg="\'@RG\\tID:${name}\\tSM:${name.split('_')[0..-2].join('_')}\\tPL:ILLUMINA\\tLB:${name}\\tPU:1\\tCN:${params.seq_center}\'"
}
"""
bwa mem \\
-t $task.cpus \\
-M \\
-R $rg \\
${index}/${bwa_base} \\
$reads \\
| samtools view -@ $task.cpus -b -h -F 0x0100 -O BAM -o ${prefix}.bam -
"""
}
/*
* STEP 3.2 - Convert .bam to coordinate sorted .bam
*/
process sortBAM {
tag "$name"
label 'process_medium'
if (params.saveAlignedIntermediates) {
publishDir path: "${params.outdir}/bwa/library", mode: 'copy',
saveAs: { filename ->
if (filename.endsWith(".flagstat")) "samtools_stats/$filename"
else if (filename.endsWith(".idxstats")) "samtools_stats/$filename"
else if (filename.endsWith(".stats")) "samtools_stats/$filename"
else filename }
}
input:
set val(name), file(bam) from ch_bwa_bam
output:
set val(name), file("*.sorted.{bam,bam.bai}") into ch_sort_bam_merge
file "*.{flagstat,idxstats,stats}" into ch_sort_bam_flagstat_mqc
script:
prefix="${name}.Lb"
"""
samtools sort -@ $task.cpus -o ${prefix}.sorted.bam -T $name $bam
samtools index ${prefix}.sorted.bam
samtools flagstat ${prefix}.sorted.bam > ${prefix}.sorted.bam.flagstat
samtools idxstats ${prefix}.sorted.bam > ${prefix}.sorted.bam.idxstats
samtools stats ${prefix}.sorted.bam > ${prefix}.sorted.bam.stats
"""
}
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/* -- -- */
/* -- MERGE LIBRARY BAM -- */
/* -- -- */
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/*
* STEP 4.1 Merge BAM files for all libraries from same sample
*/
ch_sort_bam_merge.map { it -> [ it[0].split('_')[0..-2].join('_'), it[1] ] }
.groupTuple(by: [0])
.map { it -> [ it[0], it[1].flatten() ] }
.set { ch_sort_bam_merge }
process mergeBAM {
tag "$name"
label 'process_medium'
publishDir "${params.outdir}/bwa/mergedLibrary", mode: 'copy',
saveAs: { filename ->
if (filename.endsWith(".flagstat")) "samtools_stats/$filename"
else if (filename.endsWith(".idxstats")) "samtools_stats/$filename"
else if (filename.endsWith(".stats")) "samtools_stats/$filename"
else if (filename.endsWith(".metrics.txt")) "picard_metrics/$filename"
else params.saveAlignedIntermediates ? filename : null
}
input:
set val(name), file(bams) from ch_sort_bam_merge
output:
set val(name), file("*${prefix}.sorted.{bam,bam.bai}") into ch_merge_bam_filter,
ch_merge_bam_preseq
file "*.{flagstat,idxstats,stats}" into ch_merge_bam_stats_mqc
file "*.txt" into ch_merge_bam_metrics_mqc
script:
prefix="${name}.mLb.mkD"
bam_files = bams.findAll { it.toString().endsWith('.bam') }.sort()
if (!task.memory){
log.info "[Picard MarkDuplicates] Available memory not known - defaulting to 3GB. Specify process memory requirements to change this."
avail_mem = 3
} else {
avail_mem = task.memory.toGiga()
}
if (bam_files.size() > 1) {
"""
picard -Xmx${avail_mem}g MergeSamFiles \\
${'INPUT='+bam_files.join(' INPUT=')} \\
OUTPUT=${name}.sorted.bam \\
SORT_ORDER=coordinate \\
VALIDATION_STRINGENCY=LENIENT \\
TMP_DIR=tmp
samtools index ${name}.sorted.bam
picard -Xmx${avail_mem}g MarkDuplicates \\
INPUT=${name}.sorted.bam \\
OUTPUT=${prefix}.sorted.bam \\
ASSUME_SORTED=true \\
REMOVE_DUPLICATES=false \\
METRICS_FILE=${prefix}.MarkDuplicates.metrics.txt \\
VALIDATION_STRINGENCY=LENIENT \\
TMP_DIR=tmp
samtools index ${prefix}.sorted.bam
samtools idxstats ${prefix}.sorted.bam > ${prefix}.sorted.bam.idxstats
samtools flagstat ${prefix}.sorted.bam > ${prefix}.sorted.bam.flagstat
samtools stats ${prefix}.sorted.bam > ${prefix}.sorted.bam.stats
"""
} else {
"""
picard -Xmx${avail_mem}g MarkDuplicates \\
INPUT=${bam_files[0]} \\
OUTPUT=${prefix}.sorted.bam \\
ASSUME_SORTED=true \\
REMOVE_DUPLICATES=false \\
METRICS_FILE=${prefix}.MarkDuplicates.metrics.txt \\
VALIDATION_STRINGENCY=LENIENT \\
TMP_DIR=tmp
samtools index ${prefix}.sorted.bam
samtools idxstats ${prefix}.sorted.bam > ${prefix}.sorted.bam.idxstats
samtools flagstat ${prefix}.sorted.bam > ${prefix}.sorted.bam.flagstat
samtools stats ${prefix}.sorted.bam > ${prefix}.sorted.bam.stats
"""
}
}
/*
* STEP 4.2 Filter BAM file at merged library-level
*/
process filterBAM {
tag "$name"
label 'process_medium'
publishDir path: "${params.outdir}/bwa/mergedLibrary", mode: 'copy',
saveAs: { filename ->
if (params.singleEnd || params.saveAlignedIntermediates) {
if (filename.endsWith(".flagstat")) "samtools_stats/$filename"
else if (filename.endsWith(".idxstats")) "samtools_stats/$filename"
else if (filename.endsWith(".stats")) "samtools_stats/$filename"
else if (filename.endsWith(".sorted.bam")) filename
else if (filename.endsWith(".sorted.bam.bai")) filename
else null }
}
input:
set val(name), file(bam) from ch_merge_bam_filter
file bed from ch_genome_filter_regions.collect()
file bamtools_filter_config from ch_bamtools_filter_config
output:
set val(name), file("*.{bam,bam.bai}") into ch_filter_bam
set val(name), file("*.flagstat") into ch_filter_bam_flagstat
file "*.{idxstats,stats}" into ch_filter_bam_stats_mqc
script:
prefix = params.singleEnd ? "${name}.mLb.clN" : "${name}.mLb.flT"
filter_params = params.singleEnd ? "-F 0x004" : "-F 0x004 -F 0x0008 -f 0x001"
dup_params = params.keepDups ? "" : "-F 0x0400"
multimap_params = params.keepMultiMap ? "" : "-q 1"
blacklist_params = params.blacklist ? "-L $bed" : ""
name_sort_bam = params.singleEnd ? "" : "samtools sort -n -@ $task.cpus -o ${prefix}.bam -T $prefix ${prefix}.sorted.bam"
"""
samtools view \\
$filter_params \\
$dup_params \\
$multimap_params \\
$blacklist_params \\
-b ${bam[0]} \\
| bamtools filter \\
-out ${prefix}.sorted.bam \\
-script $bamtools_filter_config
samtools index ${prefix}.sorted.bam
samtools flagstat ${prefix}.sorted.bam > ${prefix}.sorted.bam.flagstat
samtools idxstats ${prefix}.sorted.bam > ${prefix}.sorted.bam.idxstats
samtools stats ${prefix}.sorted.bam > ${prefix}.sorted.bam.stats
$name_sort_bam
"""
}
/*
* STEP 4.3 Remove orphan reads from paired-end BAM file
*/
if (params.singleEnd){
ch_filter_bam.into { ch_rm_orphan_bam_metrics;
ch_rm_orphan_bam_bigwig;
ch_rm_orphan_bam_macs_1;
ch_rm_orphan_bam_macs_2;
ch_rm_orphan_bam_phantompeakqualtools;
ch_rm_orphan_name_bam_counts }
ch_filter_bam_flagstat.into { ch_rm_orphan_flagstat_bigwig;
ch_rm_orphan_flagstat_macs;
ch_rm_orphan_flagstat_mqc }
ch_filter_bam_stats_mqc.set { ch_rm_orphan_stats_mqc }
} else {
process rmOrphanReads {
tag "$name"
label 'process_medium'
publishDir path: "${params.outdir}/bwa/mergedLibrary", mode: 'copy',
saveAs: { filename ->
if (filename.endsWith(".flagstat")) "samtools_stats/$filename"
else if (filename.endsWith(".idxstats")) "samtools_stats/$filename"
else if (filename.endsWith(".stats")) "samtools_stats/$filename"
else if (filename.endsWith(".sorted.bam")) filename
else if (filename.endsWith(".sorted.bam.bai")) filename
else null
}
input:
set val(name), file(bam) from ch_filter_bam
output:
set val(name), file("*.sorted.{bam,bam.bai}") into ch_rm_orphan_bam_metrics,
ch_rm_orphan_bam_bigwig,
ch_rm_orphan_bam_macs_1,
ch_rm_orphan_bam_macs_2,
ch_rm_orphan_bam_phantompeakqualtools
set val(name), file("${prefix}.bam") into ch_rm_orphan_name_bam_counts
set val(name), file("*.flagstat") into ch_rm_orphan_flagstat_bigwig,
ch_rm_orphan_flagstat_macs,
ch_rm_orphan_flagstat_mqc
file "*.{idxstats,stats}" into ch_rm_orphan_stats_mqc
script: // This script is bundled with the pipeline, in nf-core/chipseq/bin/
prefix="${name}.mLb.clN"
"""
bampe_rm_orphan.py ${bam[0]} ${prefix}.bam --only_fr_pairs
samtools sort -@ $task.cpus -o ${prefix}.sorted.bam -T $prefix ${prefix}.bam
samtools index ${prefix}.sorted.bam
samtools flagstat ${prefix}.sorted.bam > ${prefix}.sorted.bam.flagstat
samtools idxstats ${prefix}.sorted.bam > ${prefix}.sorted.bam.idxstats
samtools stats ${prefix}.sorted.bam > ${prefix}.sorted.bam.stats
"""
}
}
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/* -- -- */
/* -- MERGE LIBRARY BAM POST-ANALYSIS -- */
/* -- -- */
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/*
* STEP 5.1 preseq analysis after merging libraries and before filtering
*/
process preseq {
tag "$name"
label 'process_low'
publishDir "${params.outdir}/bwa/mergedLibrary/preseq", mode: 'copy'
when:
!params.skipPreseq
input:
set val(name), file(bam) from ch_merge_bam_preseq
output:
file "*.ccurve.txt" into ch_preseq_results
script:
prefix="${name}.mLb.clN"
"""
preseq lc_extrap -v -output ${prefix}.ccurve.txt -bam ${bam[0]}
"""
}
/*
* STEP 5.2 Picard CollectMultipleMetrics after merging libraries and filtering
*/
process collectMultipleMetrics {
tag "$name"
label 'process_medium'
publishDir path: "${params.outdir}/bwa/mergedLibrary", mode: 'copy',
saveAs: { filename ->
if (filename.endsWith("_metrics")) "picard_metrics/$filename"
else if (filename.endsWith(".pdf")) "picard_metrics/pdf/$filename"
else null
}
when:
!params.skipPicardMetrics
input:
set val(name), file(bam) from ch_rm_orphan_bam_metrics
file fasta from ch_fasta
output:
file "*_metrics" into ch_collectmetrics_mqc
file "*.pdf" into ch_collectmetrics_pdf
script:
prefix="${name}.mLb.clN"
if (!task.memory){
log.info "[Picard CollectMultipleMetrics] Available memory not known - defaulting to 3GB. Specify process memory requirements to change this."
avail_mem = 3
} else {
avail_mem = task.memory.toGiga()
}
"""
picard -Xmx${avail_mem}g CollectMultipleMetrics \\
INPUT=${bam[0]} \\
OUTPUT=${prefix}.CollectMultipleMetrics \\
REFERENCE_SEQUENCE=$fasta \\
VALIDATION_STRINGENCY=LENIENT \\
TMP_DIR=tmp
"""
}
/*
* STEP 5.3 Read depth normalised bigWig
*/
process bigWig {
tag "$name"
label 'process_medium'
publishDir "${params.outdir}/bwa/mergedLibrary/bigwig", mode: 'copy',
saveAs: {filename ->
if (filename.endsWith(".txt")) "scale/$filename"
else if (filename.endsWith(".bigWig")) "$filename"
else null
}
input:
set val(name), file(bam), file(flagstat) from ch_rm_orphan_bam_bigwig.join(ch_rm_orphan_flagstat_bigwig, by: [0])
file sizes from ch_genome_sizes_bigwig.collect()
output:
set val(name), file("*.bigWig") into ch_bigwig_plotprofile
file "*scale_factor.txt" into ch_bigwig_scale
file "*igv.txt" into ch_bigwig_igv
script:
prefix="${name}.mLb.clN"
pe_fragment = params.singleEnd ? "" : "-pc"
extend = (params.singleEnd && params.fragment_size > 0) ? "-fs ${params.fragment_size}" : ''
"""
SCALE_FACTOR=\$(grep 'mapped (' $flagstat | awk '{print 1000000/\$1}')
echo \$SCALE_FACTOR > ${prefix}.scale_factor.txt
genomeCoverageBed -ibam ${bam[0]} -bg -scale \$SCALE_FACTOR $pe_fragment $extend | sort -k1,1 -k2,2n > ${prefix}.bedGraph
bedGraphToBigWig ${prefix}.bedGraph $sizes ${prefix}.bigWig
find * -type f -name "*.bigWig" -exec echo -e "bwa/mergedLibrary/bigwig/"{}"\\t0,0,178" \\; > ${prefix}.bigWig.igv.txt
"""
}
/*
* STEP 5.4 generate gene body coverage plot with deepTools
*/
process plotProfile {
tag "$name"
label 'process_high'
publishDir "${params.outdir}/bwa/mergedLibrary/deepTools/plotProfile", mode: 'copy'
when:
!params.skipPlotProfile
input:
set val(name), file(bigwig) from ch_bigwig_plotprofile
file bed from ch_gene_bed
output:
file '*.{gz,pdf}' into ch_plotprofile_results
file '*.plotProfile.tab' into ch_plotprofile_mqc
script:
"""
computeMatrix scale-regions \\
--regionsFileName $bed \\
--scoreFileName $bigwig \\
--outFileName ${name}.computeMatrix.mat.gz \\
--outFileNameMatrix ${name}.computeMatrix.vals.mat.gz \\
--regionBodyLength 1000 \\
--beforeRegionStartLength 3000 \\
--afterRegionStartLength 3000 \\
--skipZeros \\
--smartLabels \\
-p $task.cpus
plotProfile --matrixFile ${name}.computeMatrix.mat.gz \\
--outFileName ${name}.plotProfile.pdf \\
--outFileNameData ${name}.plotProfile.tab
"""
}
/*
* STEP 5.5 Phantompeakqualtools
*/
process phantomPeakQualTools {
tag "$name"
label 'process_medium'
publishDir "${params.outdir}/bwa/mergedLibrary/phantompeakqualtools", mode: 'copy'
when:
!params.skipSpp
input:
set val(name), file(bam) from ch_rm_orphan_bam_phantompeakqualtools
file spp_correlation_header from ch_spp_correlation_header
file spp_nsc_header from ch_spp_nsc_header
file spp_rsc_header from ch_spp_rsc_header
output:
file '*.pdf' into ch_spp_plot
file '*.spp.out' into ch_spp_out,
ch_spp_out_mqc
file '*_mqc.tsv' into ch_spp_csv_mqc
script:
"""
RUN_SPP=`which run_spp.R`
Rscript -e "library(caTools); source(\\"\$RUN_SPP\\")" -c="${bam[0]}" -savp="${name}.spp.pdf" -savd="${name}.spp.Rdata" -out="${name}.spp.out" -p=$task.cpus
cp $spp_correlation_header ${name}_spp_correlation_mqc.tsv
Rscript -e "load('${name}.spp.Rdata'); write.table(crosscorr\\\$cross.correlation, file=\\"${name}_spp_correlation_mqc.tsv\\", sep=",", quote=FALSE, row.names=FALSE, col.names=FALSE,append=TRUE)"
awk -v OFS='\t' '{print "${name}", \$9}' ${name}.spp.out | cat $spp_nsc_header - > ${name}_spp_nsc_mqc.tsv
awk -v OFS='\t' '{print "${name}", \$10}' ${name}.spp.out | cat $spp_rsc_header - > ${name}_spp_rsc_mqc.tsv
"""
}
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/* -- -- */
/* -- MERGE LIBRARY PEAK ANALYSIS -- */
/* -- -- */
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
// Create channel linking IP bams with control bams
ch_rm_orphan_bam_macs_1.combine(ch_rm_orphan_bam_macs_2)
.set { ch_rm_orphan_bam_macs_1 }
ch_design_controls_csv.combine(ch_rm_orphan_bam_macs_1)
.filter { it[0] == it[5] && it[1] == it[7] }
.join(ch_rm_orphan_flagstat_macs)
.map { it -> it[2..-1] }
.into { ch_group_bam_macs;
ch_group_bam_plotfingerprint;
ch_group_bam_deseq }
/*
* STEP 6.1 deepTools plotFingerprint
*/
process plotFingerprint {
tag "${ip} vs ${control}"
label 'process_high'
publishDir "${params.outdir}/bwa/mergedLibrary/deepTools/plotFingerprint", mode: 'copy'
when:
!params.skipPlotFingerprint
input:
set val(antibody), val(replicatesExist), val(multipleGroups), val(ip), file(ipbam), val(control), file(controlbam), file(ipflagstat) from ch_group_bam_plotfingerprint
output:
file '*.{txt,pdf}' into ch_plotfingerprint_results
file '*.raw.txt' into ch_plotfingerprint_mqc
script:
extend = (params.singleEnd && params.fragment_size > 0) ? "--extendReads ${params.fragment_size}" : ''
"""
plotFingerprint \\
--bamfiles ${ipbam[0]} ${controlbam[0]} \\
--plotFile ${ip}.plotFingerprint.pdf \\
$extend \\
--labels $ip $control \\
--outRawCounts ${ip}.plotFingerprint.raw.txt \\
--outQualityMetrics ${ip}.plotFingerprint.qcmetrics.txt \\
--skipZeros \\
--JSDsample ${controlbam[0]} \\
--numberOfProcessors ${task.cpus} \\
--numberOfSamples ${params.fingerprint_bins}
"""
}
/*
* STEP 6.2 Call peaks with MACS2 and calculate FRiP score
*/
process macsCallPeak {
tag "${ip} vs ${control}"
label 'process_long'
publishDir "${params.outdir}/bwa/mergedLibrary/macs/${peaktype}", mode: 'copy',
saveAs: {filename ->
if (filename.endsWith(".tsv")) "qc/$filename"
else if (filename.endsWith(".igv.txt")) null
else filename
}
when:
params.macs_gsize
input:
set val(antibody), val(replicatesExist), val(multipleGroups), val(ip), file(ipbam), val(control), file(controlbam), file(ipflagstat) from ch_group_bam_macs
file peak_count_header from ch_peak_count_header
file frip_score_header from ch_frip_score_header
output:
set val(ip), file("*.{bed,xls,gappedPeak,bdg}") into ch_macs_output
set val(antibody), val(replicatesExist), val(multipleGroups), val(ip), val(control), file("*.$peaktype") into ch_macs_homer, ch_macs_qc, ch_macs_consensus
file "*igv.txt" into ch_macs_igv
file "*_mqc.tsv" into ch_macs_mqc
script:
peaktype = params.narrowPeak ? "narrowPeak" : "broadPeak"
broad = params.narrowPeak ? '' : "--broad --broad-cutoff ${params.broad_cutoff}"
format = params.singleEnd ? "BAM" : "BAMPE"
pileup = params.saveMACSPileup ? "-B --SPMR" : ""
"""
macs2 callpeak \\
-t ${ipbam[0]} \\
-c ${controlbam[0]} \\
$broad \\
-f $format \\
-g ${params.macs_gsize} \\
-n $ip \\
$pileup \\
--keep-dup all \\
--nomodel
cat ${ip}_peaks.${peaktype} | wc -l | awk -v OFS='\t' '{ print "${ip}", \$1 }' | cat $peak_count_header - > ${ip}_peaks.count_mqc.tsv
READS_IN_PEAKS=\$(intersectBed -a ${ipbam[0]} -b ${ip}_peaks.${peaktype} -bed -c -f 0.20 | awk -F '\t' '{sum += \$NF} END {print sum}')
grep 'mapped (' $ipflagstat | awk -v a="\$READS_IN_PEAKS" -v OFS='\t' '{print "${ip}", a/\$1}' | cat $frip_score_header - > ${ip}_peaks.FRiP_mqc.tsv
find * -type f -name "*.${peaktype}" -exec echo -e "bwa/mergedLibrary/macs/${peaktype}/"{}"\\t0,0,178" \\; > ${ip}_peaks.${peaktype}.igv.txt
"""
}
/*
* STEP 6.3 Annotate peaks with HOMER
*/
process annotatePeaks {
tag "${ip} vs ${control}"
label 'process_medium'
publishDir "${params.outdir}/bwa/mergedLibrary/macs/${peaktype}", mode: 'copy'
when:
params.macs_gsize
input:
set val(antibody), val(replicatesExist), val(multipleGroups), val(ip), val(control), file(peak) from ch_macs_homer
file fasta from ch_fasta
file gtf from ch_gtf
output:
file "*.txt" into ch_macs_annotate
script:
peaktype = params.narrowPeak ? "narrowPeak" : "broadPeak"
"""
annotatePeaks.pl $peak \\
$fasta \\
-gid \\
-gtf $gtf \\
> ${ip}_peaks.annotatePeaks.txt
"""
}
/*
* STEP 6.4 Aggregated QC plots for peaks, FRiP and peak-to-gene annotation
*/
process peakQC {
label "process_medium"
publishDir "${params.outdir}/bwa/mergedLibrary/macs/${peaktype}/qc", mode: 'copy'
when:
params.macs_gsize
input:
file peaks from ch_macs_qc.collect{ it[-1] }
file annos from ch_macs_annotate.collect()
file peak_annotation_header from ch_peak_annotation_header
output:
file "*.{txt,pdf}" into ch_macs_qc_output
file "*.tsv" into ch_macs_qc_mqc
script: // This script is bundled with the pipeline, in nf-core/chipseq/bin/
peaktype = params.narrowPeak ? "narrowPeak" : "broadPeak"
"""
plot_macs_qc.r -i ${peaks.join(',')} \\
-s ${peaks.join(',').replaceAll("_peaks.${peaktype}","")} \\
-o ./ \\
-p macs_peak
plot_homer_annotatepeaks.r -i ${annos.join(',')} \\
-s ${annos.join(',').replaceAll("_peaks.annotatePeaks.txt","")} \\
-o ./ \\
-p macs_annotatePeaks
cat $peak_annotation_header macs_annotatePeaks.summary.txt > macs_annotatePeaks.summary_mqc.tsv
"""
}
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/* -- -- */
/* -- CONSENSUS PEAKS ANALYSIS -- */
/* -- -- */
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
// group by ip from this point and carry forward boolean variables
ch_macs_consensus.map { it -> [ it[0], it[1], it[2], it[-1] ] }
.groupTuple()
.map { it -> [ it[0], it[1][0], it[2][0], it[3].sort() ] }
.set { ch_macs_consensus }
/*
* STEP 7.1 Consensus peaks across samples, create boolean filtering file, .saf file for featureCounts and UpSetR plot for intersection
*/
process createConsensusPeakSet {
tag "${antibody}"
label 'process_long'
publishDir "${params.outdir}/bwa/mergedLibrary/macs/${peaktype}/consensus/${antibody}", mode: 'copy',
saveAs: {filename ->
if (filename.endsWith(".igv.txt")) null
else filename
}
when:
params.macs_gsize && (replicatesExist || multipleGroups)
input:
set val(antibody), val(replicatesExist), val(multipleGroups), file(peaks) from ch_macs_consensus
output:
set val(antibody), val(replicatesExist), val(multipleGroups), file("*.bed") into ch_macs_consensus_bed
set val(antibody), file("*.saf") into ch_macs_consensus_saf
file "*.boolean.txt" into ch_macs_consensus_bool
file "*.intersect.{txt,plot.pdf}" into ch_macs_consensus_intersect
file "*igv.txt" into ch_macs_consensus_igv
script: // scripts are bundled with the pipeline, in nf-core/chipseq/bin/
prefix="${antibody}.consensus_peaks"
peaktype = params.narrowPeak ? "narrowPeak" : "broadPeak"
mergecols = params.narrowPeak ? (2..10).join(',') : (2..9).join(',')
collapsecols = params.narrowPeak ? (["collapse"]*9).join(',') : (["collapse"]*8).join(',')
expandparam = params.narrowPeak ? "--is_narrow_peak" : ""
"""
sort -k1,1 -k2,2n ${peaks.collect{it.toString()}.sort().join(' ')} \\
| mergeBed -c $mergecols -o $collapsecols > ${prefix}.txt
macs2_merged_expand.py ${prefix}.txt \\
${peaks.collect{it.toString()}.sort().join(',').replaceAll("_peaks.${peaktype}","")} \\
${prefix}.boolean.txt \\
--min_replicates $params.min_reps_consensus \\
$expandparam
awk -v FS='\t' -v OFS='\t' 'FNR > 1 { print \$1, \$2, \$3, \$4, "0", "+" }' ${prefix}.boolean.txt > ${prefix}.bed
echo -e "GeneID\tChr\tStart\tEnd\tStrand" > ${prefix}.saf
awk -v FS='\t' -v OFS='\t' 'FNR > 1 { print \$4, \$1, \$2, \$3, "+" }' ${prefix}.boolean.txt >> ${prefix}.saf
plot_peak_intersect.r -i ${prefix}.boolean.intersect.txt -o ${prefix}.boolean.intersect.plot.pdf
find * -type f -name "${prefix}.bed" -exec echo -e "bwa/mergedLibrary/macs/${peaktype}/consensus/${antibody}/"{}"\\t0,0,0" \\; > ${prefix}.bed.igv.txt
"""
}
/*
* STEP 7.2 Annotate consensus peaks with HOMER, and add annotation to boolean output file
*/
process annotateConsensusPeakSet {
tag "${antibody}"
label 'process_medium'
publishDir "${params.outdir}/bwa/mergedLibrary/macs/${peaktype}/consensus/${antibody}", mode: 'copy'
when:
params.macs_gsize && (replicatesExist || multipleGroups)
input:
set val(antibody), val(replicatesExist), val(multipleGroups), file(bed) from ch_macs_consensus_bed
file bool from ch_macs_consensus_bool
file fasta from ch_fasta
file gtf from ch_gtf
output:
file "*.annotatePeaks.txt" into ch_macs_consensus_annotate
script:
prefix="${antibody}.consensus_peaks"
peaktype = params.narrowPeak ? "narrowPeak" : "broadPeak"
"""
annotatePeaks.pl $bed \\
$fasta \\
-gid \\
-gtf $gtf \\
> ${prefix}.annotatePeaks.txt
cut -f2- ${prefix}.annotatePeaks.txt | awk 'NR==1; NR > 1 {print \$0 | "sort -k1,1 -k2,2n"}' | cut -f6- > tmp.txt
paste $bool tmp.txt > ${prefix}.boolean.annotatePeaks.txt
"""
}
// get bam and saf files for each ip
ch_group_bam_deseq.map { it -> [ it[3], [ it[0], it[1], it[2] ] ] }
.join(ch_rm_orphan_name_bam_counts)
.map { it -> [ it[1][0], it[1][1], it[1][2], it[2] ] }
.groupTuple()
.map { it -> [ it[0], it[1][0], it[2][0], it[3].flatten().sort() ] }
.join(ch_macs_consensus_saf)
.set { ch_group_bam_deseq }
/*
* STEP 7.3 Count reads in consensus peaks with featureCounts and perform differential analysis with DESeq2
*/
process deseqConsensusPeakSet {
tag "${antibody}"
label 'process_medium'
publishDir "${params.outdir}/bwa/mergedLibrary/macs/${peaktype}/consensus/${antibody}/deseq2", mode: 'copy',
saveAs: {filename ->
if (filename.endsWith(".igv.txt")) null
else filename
}
when:
params.macs_gsize && !params.skipDiffAnalysis && replicatesExist && multipleGroups
input:
set val(antibody), val(replicatesExist), val(multipleGroups), file(bams) ,file(saf) from ch_group_bam_deseq
file deseq2_pca_header from ch_deseq2_pca_header
file deseq2_clustering_header from ch_deseq2_clustering_header
output:
file "*featureCounts.txt" into ch_macs_consensus_counts
file "*featureCounts.txt.summary" into ch_macs_consensus_counts_mqc
file "*.{RData,results.txt,pdf,log}" into ch_macs_consensus_deseq_results
file "sizeFactors" into ch_macs_consensus_deseq_factors
file "*vs*/*.{pdf,txt}" into ch_macs_consensus_deseq_comp_results
file "*vs*/*.bed" into ch_macs_consensus_deseq_comp_bed
file "*igv.txt" into ch_macs_consensus_deseq_comp_igv
file "*.tsv" into ch_macs_consensus_deseq_mqc
script:
prefix="${antibody}.consensus_peaks"
peaktype = params.narrowPeak ? "narrowPeak" : "broadPeak"
bam_files = bams.findAll { it.toString().endsWith('.bam') }.sort()
bam_ext = params.singleEnd ? ".mLb.clN.sorted.bam" : ".mLb.clN.bam"
pe_params = params.singleEnd ? '' : "-p --donotsort"
"""
featureCounts -F SAF \\
-O \\
--fracOverlap 0.2 \\
-T $task.cpus \\
$pe_params \\
-a $saf \\
-o ${prefix}.featureCounts.txt \\
${bam_files.join(' ')}
featurecounts_deseq2.r -i ${prefix}.featureCounts.txt -b '$bam_ext' -o ./ -p $prefix -s .mLb
sed 's/deseq2_pca/deseq2_pca_${task.index}/g' <$deseq2_pca_header >tmp.txt
sed -i -e 's/DESeq2:/${antibody} DESeq2:/g' tmp.txt
cat tmp.txt ${prefix}.pca.vals.txt > ${prefix}.pca.vals_mqc.tsv
sed 's/deseq2_clustering/deseq2_clustering_${task.index}/g' <$deseq2_clustering_header >tmp.txt
sed -i -e 's/DESeq2:/${antibody} DESeq2:/g' tmp.txt
cat tmp.txt ${prefix}.sample.dists.txt > ${prefix}.sample.dists_mqc.tsv
find * -type f -name "*.FDR0.05.results.bed" -exec echo -e "bwa/mergedLibrary/macs/${peaktype}/consensus/${antibody}/deseq2/"{}"\\t255,0,0" \\; > ${prefix}.igv.txt
"""
}
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/* -- -- */
/* -- IGV -- */
/* -- -- */
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/*
* STEP 8 - Create IGV session file
*/
process igv {
publishDir "${params.outdir}/igv", mode: 'copy'
when:
!params.skipIGV
input:
file fasta from ch_fasta
file bigwigs from ch_bigwig_igv.collect().ifEmpty([])
file peaks from ch_macs_igv.collect().ifEmpty([])
file consensus_peaks from ch_macs_consensus_igv.collect().ifEmpty([])
file differential_peaks from ch_macs_consensus_deseq_comp_igv.collect().ifEmpty([])
output:
file "*.{txt,xml}" into ch_igv_session
script: // scripts are bundled with the pipeline, in nf-core/chipseq/bin/
"""
cat *.txt > igv_files.txt
igv_files_to_session.py igv_session.xml igv_files.txt ../reference_genome/${fasta.getName()} --path_prefix '../'
"""
}
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/* -- -- */
/* -- MULTIQC -- */
/* -- -- */
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/*
* Parse software version numbers
*/
process get_software_versions {
publishDir "${params.outdir}/pipeline_info", mode: 'copy',
saveAs: {filename ->
if (filename.indexOf(".csv") > 0) filename
else null
}
output:
file 'software_versions_mqc.yaml' into ch_software_versions_mqc
file "software_versions.csv"
script:
"""
echo $workflow.manifest.version > v_pipeline.txt
echo $workflow.nextflow.version > v_nextflow.txt
fastqc --version > v_fastqc.txt
trim_galore --version > v_trim_galore.txt
echo \$(bwa 2>&1) > v_bwa.txt
samtools --version > v_samtools.txt
bedtools --version > v_bedtools.txt
echo \$(bamtools --version 2>&1) > v_bamtools.txt
echo \$(plotFingerprint --version 2>&1) > v_deeptools.txt || true
picard MarkDuplicates --version &> v_picard.txt || true
echo \$(R --version 2>&1) > v_R.txt
python -c "import pysam; print(pysam.__version__)" > v_pysam.txt
echo \$(macs2 --version 2>&1) > v_macs2.txt
touch v_homer.txt
echo \$(featureCounts -v 2>&1) > v_featurecounts.txt
preseq &> v_preseq.txt
multiqc --version > v_multiqc.txt
scrape_software_versions.py &> software_versions_mqc.yaml
"""
}
def create_workflow_summary(summary) {
def yaml_file = workDir.resolve('workflow_summary_mqc.yaml')
yaml_file.text = """
id: 'nf-core-chipseq-summary'
description: " - this information is collected when the pipeline is started."
section_name: 'nf-core/chipseq Workflow Summary'
section_href: 'https://github.com/nf-core/chipseq'
plot_type: 'html'
data: |
<dl class=\"dl-horizontal\">
${summary.collect { k,v -> " <dt>$k</dt><dd><samp>${v ?: '<span style=\"color:#999999;\">N/A</a>'}</samp></dd>" }.join("\n")}
</dl>
""".stripIndent()
return yaml_file
}
/*
* STEP 9 - MultiQC
*/
process multiqc {
publishDir "${params.outdir}/multiqc", mode: 'copy'
when:
!params.skipMultiQC
input:
file multiqc_config from ch_multiqc_config
file ('fastqc/*') from ch_fastqc_reports_mqc.collect()
file ('trimgalore/*') from ch_trimgalore_results_mqc.collect()
file ('trimgalore/fastqc/*') from ch_trimgalore_fastqc_reports_mqc.collect()
file ('alignment/library/*') from ch_sort_bam_flagstat_mqc.collect()
file ('alignment/mergedLibrary/*') from ch_merge_bam_stats_mqc.collect()
file ('alignment/mergedLibrary/*') from ch_rm_orphan_flagstat_mqc.collect{it[1]}
file ('alignment/mergedLibrary/*') from ch_rm_orphan_stats_mqc.collect()
file ('alignment/mergedLibrary/picard_metrics/*') from ch_merge_bam_metrics_mqc.collect()
file ('alignment/mergedLibrary/picard_metrics/*') from ch_collectmetrics_mqc.collect()
file ('macs/*') from ch_macs_mqc.collect().ifEmpty([])
file ('macs/*') from ch_macs_qc_mqc.collect().ifEmpty([])
file ('macs/consensus/*') from ch_macs_consensus_counts_mqc.collect().ifEmpty([])
file ('macs/consensus/*') from ch_macs_consensus_deseq_mqc.collect().ifEmpty([])
file ('preseq/*') from ch_preseq_results.collect().ifEmpty([])
file ('deeptools/*') from ch_plotfingerprint_mqc.collect().ifEmpty([])
file ('deeptools/*') from ch_plotprofile_mqc.collect().ifEmpty([])
file ('phantompeakqualtools/*') from ch_spp_out_mqc.collect().ifEmpty([])
file ('phantompeakqualtools/*') from ch_spp_csv_mqc.collect().ifEmpty([])
file ('software_versions/*') from ch_software_versions_mqc.collect()
file ('workflow_summary/*') from create_workflow_summary(summary)
output:
file "*multiqc_report.html" into ch_multiqc_report
file "*_data"
file "multiqc_plots"
script:
rtitle = custom_runName ? "--title \"$custom_runName\"" : ''
rfilename = custom_runName ? "--filename " + custom_runName.replaceAll('\\W','_').replaceAll('_+','_') + "_multiqc_report" : ''
mqcstats = params.skipMultiQCStats ? '--cl_config "skip_generalstats: true"' : ''
"""
multiqc . -f $rtitle $rfilename --config $multiqc_config \\
-m custom_content -m fastqc -m cutadapt -m samtools -m picard -m preseq -m featureCounts -m deeptools -m phantompeakqualtools \\
$mqcstats
"""
}
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/* -- -- */
/* -- REPORTS/DOCUMENTATION -- */
/* -- -- */
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/*
* STEP 10 - Output description HTML
*/
process output_documentation {
publishDir "${params.outdir}/Documentation", mode: 'copy'
input:
file output_docs from ch_output_docs
output:
file "results_description.html"
script:
"""
markdown_to_html.r $output_docs results_description.html
"""
}
/*
* Completion e-mail notification
*/
workflow.onComplete {
// Set up the e-mail variables
def subject = "[nf-core/chipseq] Successful: $workflow.runName"
if(!workflow.success){
subject = "[nf-core/chipseq] FAILED: $workflow.runName"
}
def email_fields = [:]
email_fields['version'] = workflow.manifest.version
email_fields['runName'] = custom_runName ?: workflow.runName
email_fields['success'] = workflow.success
email_fields['dateComplete'] = workflow.complete
email_fields['duration'] = workflow.duration
email_fields['exitStatus'] = workflow.exitStatus
email_fields['errorMessage'] = (workflow.errorMessage ?: 'None')
email_fields['errorReport'] = (workflow.errorReport ?: 'None')
email_fields['commandLine'] = workflow.commandLine
email_fields['projectDir'] = workflow.projectDir
email_fields['summary'] = summary
email_fields['summary']['Date Started'] = workflow.start
email_fields['summary']['Date Completed'] = workflow.complete
email_fields['summary']['Pipeline script file path'] = workflow.scriptFile
email_fields['summary']['Pipeline script hash ID'] = workflow.scriptId
if(workflow.repository) email_fields['summary']['Pipeline repository Git URL'] = workflow.repository
if(workflow.commitId) email_fields['summary']['Pipeline repository Git Commit'] = workflow.commitId
if(workflow.revision) email_fields['summary']['Pipeline Git branch/tag'] = workflow.revision
if(workflow.container) email_fields['summary']['Docker image'] = workflow.container
email_fields['summary']['Nextflow Version'] = workflow.nextflow.version
email_fields['summary']['Nextflow Build'] = workflow.nextflow.build
email_fields['summary']['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp
// On success try attach the multiqc report
def mqc_report = null
try {
if (workflow.success) {
mqc_report = ch_multiqc_report.getVal()
if (mqc_report.getClass() == ArrayList){
log.warn "[nf-core/chipseq] Found multiple reports from process 'multiqc', will use only one"
mqc_report = mqc_report[0]
}
}
} catch (all) {
log.warn "[nf-core/chipseq] Could not attach MultiQC report to summary email"
}
// Render the TXT template
def engine = new groovy.text.GStringTemplateEngine()
def tf = new File("$baseDir/assets/email_template.txt")
def txt_template = engine.createTemplate(tf).make(email_fields)
def email_txt = txt_template.toString()
// Render the HTML template
def hf = new File("$baseDir/assets/email_template.html")
def html_template = engine.createTemplate(hf).make(email_fields)
def email_html = html_template.toString()
// Render the sendmail template
def smail_fields = [ email: params.email, subject: subject, email_txt: email_txt, email_html: email_html, baseDir: "$baseDir", mqcFile: mqc_report, mqcMaxSize: params.maxMultiqcEmailFileSize.toBytes() ]
def sf = new File("$baseDir/assets/sendmail_template.txt")
def sendmail_template = engine.createTemplate(sf).make(smail_fields)
def sendmail_html = sendmail_template.toString()
// Send the HTML e-mail
if (params.email) {
try {
if( params.plaintext_email ){ throw GroovyException('Send plaintext e-mail, not HTML') }
// Try to send HTML e-mail using sendmail
[ 'sendmail', '-t' ].execute() << sendmail_html
log.info "[nf-core/chipseq] Sent summary e-mail to $params.email (sendmail)"
} catch (all) {
// Catch failures and try with plaintext
[ 'mail', '-s', subject, params.email ].execute() << email_txt
log.info "[nf-core/chipseq] Sent summary e-mail to $params.email (mail)"
}
}
// Write summary e-mail HTML to a file
def output_d = new File( "${params.outdir}/pipeline_info/" )
if( !output_d.exists() ) {
output_d.mkdirs()
}
def output_hf = new File( output_d, "pipeline_report.html" )
output_hf.withWriter { w -> w << email_html }
def output_tf = new File( output_d, "pipeline_report.txt" )
output_tf.withWriter { w -> w << email_txt }
c_reset = params.monochrome_logs ? '' : "\033[0m";
c_purple = params.monochrome_logs ? '' : "\033[0;35m";
c_green = params.monochrome_logs ? '' : "\033[0;32m";
c_red = params.monochrome_logs ? '' : "\033[0;31m";
if (workflow.stats.ignoredCount > 0 && workflow.success) {
log.info "${c_purple}Warning, pipeline completed, but with errored process(es) ${c_reset}"
log.info "${c_red}Number of ignored errored process(es) : ${workflow.stats.ignoredCount} ${c_reset}"
log.info "${c_green}Number of successfully ran process(es) : ${workflow.stats.succeedCount} ${c_reset}"
}
if(workflow.success){
log.info "${c_purple}[nf-core/chipseq]${c_green} Pipeline completed successfully${c_reset}"
} else {
checkHostname()
log.info "${c_purple}[nf-core/chipseq]${c_red} Pipeline completed with errors${c_reset}"
}
}
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/* -- -- */
/* -- NF-CORE HEADER -- */
/* -- -- */
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
def nfcoreHeader(){
// Log colors ANSI codes
c_reset = params.monochrome_logs ? '' : "\033[0m";
c_dim = params.monochrome_logs ? '' : "\033[2m";
c_black = params.monochrome_logs ? '' : "\033[0;30m";
c_green = params.monochrome_logs ? '' : "\033[0;32m";
c_yellow = params.monochrome_logs ? '' : "\033[0;33m";
c_blue = params.monochrome_logs ? '' : "\033[0;34m";
c_purple = params.monochrome_logs ? '' : "\033[0;35m";
c_cyan = params.monochrome_logs ? '' : "\033[0;36m";
c_white = params.monochrome_logs ? '' : "\033[0;37m";
return """ ${c_dim}----------------------------------------------------${c_reset}
${c_green},--.${c_black}/${c_green},-.${c_reset}
${c_blue} ___ __ __ __ ___ ${c_green}/,-._.--~\'${c_reset}
${c_blue} |\\ | |__ __ / ` / \\ |__) |__ ${c_yellow}} {${c_reset}
${c_blue} | \\| | \\__, \\__/ | \\ |___ ${c_green}\\`-._,-`-,${c_reset}
${c_green}`._,._,\'${c_reset}
${c_purple} nf-core/chipseq v${workflow.manifest.version}${c_reset}
${c_dim}----------------------------------------------------${c_reset}
""".stripIndent()
}
def checkHostname(){
def c_reset = params.monochrome_logs ? '' : "\033[0m"
def c_white = params.monochrome_logs ? '' : "\033[0;37m"
def c_red = params.monochrome_logs ? '' : "\033[1;91m"
def c_yellow_bold = params.monochrome_logs ? '' : "\033[1;93m"
if(params.hostnames){
def hostname = "hostname".execute().text.trim()
params.hostnames.each { prof, hnames ->
hnames.each { hname ->
if(hostname.contains(hname) && !workflow.profile.contains(prof)){
log.error "====================================================\n" +
" ${c_red}WARNING!${c_reset} You are running with `-profile $workflow.profile`\n" +
" but your machine hostname is ${c_white}'$hostname'${c_reset}\n" +
" ${c_yellow_bold}It's highly recommended that you use `-profile $prof${c_reset}`\n" +
"============================================================"
}
}
}
}
}
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
/* -- -- */
/* -- END OF PIPELINE -- */
/* -- -- */
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
Standardised workflows
monolithic
Need to change often
Siloed tool containers
Don't do much by themselves
Monolithic pipelines
The premise:
Workflow based development
Component based development
Components are modular pieces with some basic rules:
Component A
- Input/Output
- Parameters
- Resources
Component B
- Input/Output
- Parameters
- Resources
With this framework, building workflows becomes dead simple:
flowcraft build -t 'trimmomatic fastqc spades pilon' -o my_nextflow_pipeline
Results in the following workflow DAG (direct acyclic graph)
It's easy to get experimental:
flowcraft build -t 'trimmomatic fastqc skesa pilon' -o my_nextflow_pipeline
Switch spades for skesa
It's easy to get wild:
flowcraft build -t 'reads_download (
spades | skesa pilon (abricate | chewbbaca) | megahit |
fastqc_trimmomatic fastqc (spades pilon (
mlst | prokka | chewbbaca) | skesa pilon abricate))' -o my_nextflow_pipeline
wait, what?
#!/usr/bin/env nextflow
import Helper
import CollectInitialMetadata
// Pipeline version
if (workflow.commitId){
version = "0.1 $workflow.revision"
} else {
version = "0.1 (local version)"
}
params.help = false
if (params.help){
Help.print_help(params)
exit 0
}
def infoMap = [:]
if (params.containsKey("fastq")){
infoMap.put("fastq", file(params.fastq).size())
}
if (params.containsKey("fasta")){
if (file(params.fasta) instanceof LinkedList){
infoMap.put("fasta", file(params.fasta).size())
} else {
infoMap.put("fasta", 1)
}
}
if (params.containsKey("accessions")){
// checks if params.accessions is different from null
if (params.accessions) {
BufferedReader reader = new BufferedReader(new FileReader(params.accessions));
int lines = 0;
while (reader.readLine() != null) lines++;
reader.close();
infoMap.put("accessions", lines)
}
}
Help.start_info(infoMap, "$workflow.start", "$workflow.profile")
CollectInitialMetadata.print_metadata(workflow)
// Placeholder for main input channels
if (!params.accessions){ exit 1, "'accessions' parameter missing" }
IN_accessions_raw = Channel.fromPath(params.accessions).ifEmpty { exit 1, "No accessions file provided with path:'${params.accessions}'" }
// Placeholder for secondary input channels
// Placeholder for extra input channels
// Placeholder to fork the raw input channel
IN_accessions_raw.set{ reads_download_in_1_0 }
if (params.asperaKey_1_1){
if (file(params.asperaKey_1_1).exists()){
IN_asperaKey_1_1 = Channel.fromPath(params.asperaKey_1_1)
} else {
IN_asperaKey_1_1 = Channel.value("")
}
} else {
IN_asperaKey_1_1 = Channel.value("")
}
process reads_download_1_1 {
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 1_1 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 1_1 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 1_1 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId reads_download_1_1 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { accession_id }
publishDir "reads", pattern: "${accession_id}/*fq.gz"
maxRetries 1
input:
set val(accession_id), val(name) from reads_download_in_1_0.splitText(){ it.trim() }.unique().filter{ it != "" }.map{ it.split().length > 1 ? ["accession": it.split()[0], "name": it.split()[1]] : [it.split()[0], null] }
each file(aspera_key) from IN_asperaKey_1_1
output:
set val({ "$name" != "null" ? "$name" : "$accession_id" }), file("${accession_id}/*fq.gz") optional true into _reads_download_out_1_0
set accession_id, val("1_1_reads_download"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_reads_download_1_1
set accession_id, val("reads_download_1_1"), val("1_1"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_reads_download_1_1
file ".versions"
script:
"""
{
# getSeqENA requires accession numbers to be provided as a text file
echo "${accession_id}" >> accession_file.txt
# Set default status value. It will be overwritten if anything goes wrong
echo "pass" > ".status"
if [ -f $aspera_key ]; then
asperaOpt="-a $aspera_key"
else
asperaOpt=""
fi
getSeqENA.py -l accession_file.txt \$asperaOpt -o ./ --SRAopt --downloadCramBam
# If a name has been provided along with the accession, rename the
# fastq files.
if [ $name != null ];
then
echo renaming pattern '${accession_id}' to '${name}' && cd ${accession_id} && rename "s/${accession_id}/${name}/" *.gz
fi
} || {
# If exit code other than 0
if [ \$? -eq 0 ]
then
echo "pass" > .status
else
echo "fail" > .status
echo "Could not download accession $accession_id" > .fail
fi
}
version_str="{'version':[{'program':'getSeqENA.py','version':'1.3'}]}"
echo \$version_str > .versions
"""
}
_reads_download_out_1_0.into{ reads_download_out_1_0;spades_in_1_1;skesa_in_1_2;megahit_in_1_3;fastqc_trimmomatic_in_1_4;_LAST_fastq_3_9 }
IN_genome_size_2_2 = Channel.value(params.genomeSize_2_2)
.map{it -> it.toString().isNumber() ? it : exit(1, "The genomeSize parameter must be a number or a float. Provided value: '${params.genomeSize__2_2}'")}
IN_min_coverage_2_2 = Channel.value(params.minCoverage_2_2)
.map{it -> it.toString().isNumber() ? it : exit(1, "The minCoverage parameter must be a number or a float. Provided value: '${params.minCoverage__2_2}'")}
process integrity_coverage_2_2 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 2_2 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 2_2 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 2_2 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId integrity_coverage_2_2 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
// This process can only use a single CPU
cpus 1
input:
set sample_id, file(fastq_pair) from spades_in_1_1
val gsize from IN_genome_size_2_2
val cov from IN_min_coverage_2_2
// This channel is for the custom options of the integrity_coverage.py
// script. See the script's documentation for more information.
val opts from Channel.value('')
output:
set sample_id,
file(fastq_pair),
file('*_encoding'),
file('*_phred'),
file('*_coverage'),
file('*_max_len') into MAIN_integrity_2_2
file('*_report') optional true into LOG_report_coverage1_2_2
set sample_id, val("2_2_integrity_coverage"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_integrity_coverage_2_2
set sample_id, val("integrity_coverage_2_2"), val("2_2"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_integrity_coverage_2_2
file ".versions"
script:
template "integrity_coverage.py"
}
// TRIAGE OF CORRUPTED SAMPLES
LOG_corrupted_2_2 = Channel.create()
MAIN_PreCoverageCheck_2_2 = Channel.create()
// Corrupted samples have the 2nd value with 'corrupt'
MAIN_integrity_2_2.choice(LOG_corrupted_2_2, MAIN_PreCoverageCheck_2_2) {
a -> a[2].text == "corrupt" ? 0 : 1
}
// TRIAGE OF LOW COVERAGE SAMPLES
integrity_coverage_out_2_1_dep = Channel.create()
SIDE_phred_2_2 = Channel.create()
SIDE_max_len_2_2 = Channel.create()
MAIN_PreCoverageCheck_2_2
// Low coverage samples have the 4th value of the Channel with 'fail'
.filter{ it[4].text != "fail" }
// For the channel to proceed with FastQ in 'sample_good' and the
// Phred scores for each sample in 'SIDE_phred'
.separate(integrity_coverage_out_2_1_dep, SIDE_phred_2_2, SIDE_max_len_2_2){
a -> [ [a[0], a[1]], [a[0], a[3].text], [a[0], a[5].text] ]
}
/** REPORT_COVERAGE - PLUG-IN
This process will report the expected coverage for each non-corrupted sample
and write the results to 'reports/coverage/estimated_coverage_initial.csv'
*/
process report_coverage_2_2 {
// This process can only use a single CPU
cpus 1
publishDir 'reports/coverage_2_2/'
input:
file(report) from LOG_report_coverage1_2_2.filter{ it.text != "corrupt" }.collect()
output:
file 'estimated_coverage_initial.csv'
"""
echo Sample,Estimated coverage,Status >> estimated_coverage_initial.csv
cat $report >> estimated_coverage_initial.csv
"""
}
/** REPORT_CORRUPT - PLUG-IN
This process will report the corrupted samples and write the results to
'reports/corrupted/corrupted_samples.txt'
*/
process report_corrupt_2_2 {
// This process can only use a single CPU
cpus 1
publishDir 'reports/corrupted_2_2/'
input:
val sample_id from LOG_corrupted_2_2.collect{it[0]}
output:
file 'corrupted_samples.txt'
"""
echo ${sample_id.join(",")} | tr "," "\n" >> corrupted_samples.txt
"""
}
SIDE_max_len_2_2.set{ SIDE_max_len_2_3 }
if ( !params.spadesMinCoverage_2_3.toString().isNumber() ){
exit 1, "'spadesMinCoverage_2_3' parameter must be a number. Provided value: '${params.spadesMinCoverage_2_3}'"
}
if ( !params.spadesMinKmerCoverage_2_3.toString().isNumber()){
exit 1, "'spadesMinKmerCoverage_2_3' parameter must be a number. Provided value: '${params.spadesMinKmerCoverage_2_3}'"
}
IN_spades_opts_2_3 = Channel.value(
[params.spadesMinCoverage_2_3,
params.spadesMinKmerCoverage_2_3
])
if ( params.spadesKmers_2_3.toString().split(" ").size() <= 1 ){
if (params.spadesKmers_2_3.toString() != 'auto'){
exit 1, "'spadesKmers_2_3' parameter must be a sequence of space separated numbers or 'auto'. Provided value: ${params.spadesKmers_2_3}"
}
}
IN_spades_kmers_2_3 = Channel.value(params.spadesKmers_2_3)
clear = params.clearInput_2_3 ? "true" : "false"
disable_rr = params.disableRR_2_3 ? "true" : "false"
checkpointClear_2_3 = Channel.value(clear)
disableRR_2_3 = Channel.value(disable_rr)
process spades_2_3 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 2_3 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 2_3 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 2_3 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId spades_2_3 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
publishDir 'results/assembly/spades_2_3/', pattern: '*_spades*.fasta', mode: 'copy'
publishDir "results/assembly/spades_2_3/$sample_id", pattern: "*.gfa", mode: "copy"
publishDir "results/assembly/spades_2_3/$sample_id", pattern: "*.fastg", mode: "copy"
input:
set sample_id, file(fastq_pair), max_len from integrity_coverage_out_2_1_dep.join(SIDE_max_len_2_3)
val opts from IN_spades_opts_2_3
val kmers from IN_spades_kmers_2_3
val clear from checkpointClear_2_3
val disable_rr from disableRR_2_3
output:
set sample_id, file('*_spades*.fasta') into spades_out_2_1
file "*.fastg" optional true
file "*.gfa" into gfa1_2_3
set sample_id, val("2_3_spades"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_spades_2_3
set sample_id, val("spades_2_3"), val("2_3"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_spades_2_3
file ".versions"
script:
template "spades.py"
}
clear = params.clearInput_3_4 ? "true" : "false"
checkpointClear_3_4 = Channel.value(clear)
process skesa_3_4 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 3_4 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 3_4 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 3_4 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId skesa_3_4 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
publishDir 'results/assembly/skesa_3_4', pattern: '*skesa*.fasta', mode: 'copy'
input:
set sample_id, file(fastq_pair) from skesa_in_1_2
val clear from checkpointClear_3_4
output:
set sample_id, file('*.fasta') into skesa_out_3_2
set sample_id, val("3_4_skesa"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_skesa_3_4
set sample_id, val("skesa_3_4"), val("3_4"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_skesa_3_4
file ".versions"
script:
template "skesa.py"
}
IN_genome_size_4_5 = Channel.value(params.genomeSize_4_5)
.map{it -> it.toString().isNumber() ? it : exit(1, "The genomeSize parameter must be a number or a float. Provided value: '${params.genomeSize__4_5}'")}
IN_min_coverage_4_5 = Channel.value(params.minCoverage_4_5)
.map{it -> it.toString().isNumber() ? it : exit(1, "The minCoverage parameter must be a number or a float. Provided value: '${params.minCoverage__4_5}'")}
process integrity_coverage_4_5 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 4_5 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 4_5 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 4_5 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId integrity_coverage_4_5 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
// This process can only use a single CPU
cpus 1
input:
set sample_id, file(fastq_pair) from megahit_in_1_3
val gsize from IN_genome_size_4_5
val cov from IN_min_coverage_4_5
// This channel is for the custom options of the integrity_coverage.py
// script. See the script's documentation for more information.
val opts from Channel.value('')
output:
set sample_id,
file(fastq_pair),
file('*_encoding'),
file('*_phred'),
file('*_coverage'),
file('*_max_len') into MAIN_integrity_4_5
file('*_report') optional true into LOG_report_coverage1_4_5
set sample_id, val("4_5_integrity_coverage"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_integrity_coverage_4_5
set sample_id, val("integrity_coverage_4_5"), val("4_5"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_integrity_coverage_4_5
file ".versions"
script:
template "integrity_coverage.py"
}
// TRIAGE OF CORRUPTED SAMPLES
LOG_corrupted_4_5 = Channel.create()
MAIN_PreCoverageCheck_4_5 = Channel.create()
// Corrupted samples have the 2nd value with 'corrupt'
MAIN_integrity_4_5.choice(LOG_corrupted_4_5, MAIN_PreCoverageCheck_4_5) {
a -> a[2].text == "corrupt" ? 0 : 1
}
// TRIAGE OF LOW COVERAGE SAMPLES
integrity_coverage_out_4_3_dep = Channel.create()
SIDE_phred_4_5 = Channel.create()
SIDE_max_len_4_5 = Channel.create()
MAIN_PreCoverageCheck_4_5
// Low coverage samples have the 4th value of the Channel with 'fail'
.filter{ it[4].text != "fail" }
// For the channel to proceed with FastQ in 'sample_good' and the
// Phred scores for each sample in 'SIDE_phred'
.separate(integrity_coverage_out_4_3_dep, SIDE_phred_4_5, SIDE_max_len_4_5){
a -> [ [a[0], a[1]], [a[0], a[3].text], [a[0], a[5].text] ]
}
/** REPORT_COVERAGE - PLUG-IN
This process will report the expected coverage for each non-corrupted sample
and write the results to 'reports/coverage/estimated_coverage_initial.csv'
*/
process report_coverage_4_5 {
// This process can only use a single CPU
cpus 1
publishDir 'reports/coverage_4_5/'
input:
file(report) from LOG_report_coverage1_4_5.filter{ it.text != "corrupt" }.collect()
output:
file 'estimated_coverage_initial.csv'
"""
echo Sample,Estimated coverage,Status >> estimated_coverage_initial.csv
cat $report >> estimated_coverage_initial.csv
"""
}
/** REPORT_CORRUPT - PLUG-IN
This process will report the corrupted samples and write the results to
'reports/corrupted/corrupted_samples.txt'
*/
process report_corrupt_4_5 {
// This process can only use a single CPU
cpus 1
publishDir 'reports/corrupted_4_5/'
input:
val sample_id from LOG_corrupted_4_5.collect{it[0]}
output:
file 'corrupted_samples.txt'
"""
echo ${sample_id.join(",")} | tr "," "\n" >> corrupted_samples.txt
"""
}
SIDE_max_len_4_5.set{ SIDE_max_len_4_6 }
if ( params.megahitKmers_4_6.toString().split(" ").size() <= 1 ){
if (params.megahitKmers_4_6.toString() != 'auto'){
exit 1, "'megahitKmers_4_6' parameter must be a sequence of space separated numbers or 'auto'. Provided value: ${params.megahitKmers_4_6}"
}
}
IN_megahit_kmers_4_6 = Channel.value(params.megahitKmers_4_6)
clear = params.clearInput_4_6 ? "true" : "false"
checkpointClear_4_6 = Channel.value(clear)
process megahit_4_6 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 4_6 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 4_6 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 4_6 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId megahit_4_6 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
publishDir 'results/assembly/megahit_4_6/', pattern: '*_megahit*.fasta', mode: 'copy'
input:
set sample_id, file(fastq_pair), max_len from integrity_coverage_out_4_3_dep.join(SIDE_max_len_4_6)
val kmers from IN_megahit_kmers_4_6
val clear from checkpointClear_4_6
output:
set sample_id, file('*megahit*.fasta') into megahit_out_4_3
set sample_id, file('megahit/intermediate_contigs/k*.contigs.fa') into IN_fastg4_6
set sample_id, val("4_6_megahit"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_megahit_4_6
set sample_id, val("megahit_4_6"), val("4_6"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_megahit_4_6
file ".versions"
script:
template "megahit.py"
}
fastg = params.fastg_4_6 ? "true" : "false"
process megahit_fastg_4_6{
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 4_6 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 4_6 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 4_6 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId megahit_4_6 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
publishDir "results/assembly/megahit_4_6/$sample_id", pattern: "*.fastg"
input:
set sample_id, file(kmer_files) from IN_fastg4_6
val run_fastg from fastg
output:
file "*.fastg" optional true
set sample_id, val("4_6_megahit_fastg"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_megahit_fastg_4_6
set sample_id, val("megahit_fastg_4_6"), val("4_6"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_megahit_fastg_4_6
file ".versions"
script:
"""
if [ ${run_fastg} == "true" ]
then
for kmer_file in ${kmer_files};
do
echo \$kmer_file
k=\$(echo \$kmer_file | cut -d '.' -f 1);
echo \$k
megahit_toolkit contig2fastg \$k \$kmer_file > \$kmer_file'.fastg';
done
fi
"""
}
IN_genome_size_5_7 = Channel.value(params.genomeSize_5_7)
.map{it -> it.toString().isNumber() ? it : exit(1, "The genomeSize parameter must be a number or a float. Provided value: '${params.genomeSize__5_7}'")}
IN_min_coverage_5_7 = Channel.value(params.minCoverage_5_7)
.map{it -> it.toString().isNumber() ? it : exit(1, "The minCoverage parameter must be a number or a float. Provided value: '${params.minCoverage__5_7}'")}
process integrity_coverage_5_7 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 5_7 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 5_7 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 5_7 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId integrity_coverage_5_7 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
// This process can only use a single CPU
cpus 1
input:
set sample_id, file(fastq_pair) from fastqc_trimmomatic_in_1_4
val gsize from IN_genome_size_5_7
val cov from IN_min_coverage_5_7
// This channel is for the custom options of the integrity_coverage.py
// script. See the script's documentation for more information.
val opts from Channel.value('')
output:
set sample_id,
file(fastq_pair),
file('*_encoding'),
file('*_phred'),
file('*_coverage'),
file('*_max_len') into MAIN_integrity_5_7
file('*_report') optional true into LOG_report_coverage1_5_7
set sample_id, val("5_7_integrity_coverage"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_integrity_coverage_5_7
set sample_id, val("integrity_coverage_5_7"), val("5_7"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_integrity_coverage_5_7
file ".versions"
script:
template "integrity_coverage.py"
}
// TRIAGE OF CORRUPTED SAMPLES
LOG_corrupted_5_7 = Channel.create()
MAIN_PreCoverageCheck_5_7 = Channel.create()
// Corrupted samples have the 2nd value with 'corrupt'
MAIN_integrity_5_7.choice(LOG_corrupted_5_7, MAIN_PreCoverageCheck_5_7) {
a -> a[2].text == "corrupt" ? 0 : 1
}
// TRIAGE OF LOW COVERAGE SAMPLES
integrity_coverage_out_5_4_dep = Channel.create()
SIDE_phred_5_7 = Channel.create()
SIDE_max_len_5_7 = Channel.create()
MAIN_PreCoverageCheck_5_7
// Low coverage samples have the 4th value of the Channel with 'fail'
.filter{ it[4].text != "fail" }
// For the channel to proceed with FastQ in 'sample_good' and the
// Phred scores for each sample in 'SIDE_phred'
.separate(integrity_coverage_out_5_4_dep, SIDE_phred_5_7, SIDE_max_len_5_7){
a -> [ [a[0], a[1]], [a[0], a[3].text], [a[0], a[5].text] ]
}
/** REPORT_COVERAGE - PLUG-IN
This process will report the expected coverage for each non-corrupted sample
and write the results to 'reports/coverage/estimated_coverage_initial.csv'
*/
process report_coverage_5_7 {
// This process can only use a single CPU
cpus 1
publishDir 'reports/coverage_5_7/'
input:
file(report) from LOG_report_coverage1_5_7.filter{ it.text != "corrupt" }.collect()
output:
file 'estimated_coverage_initial.csv'
"""
echo Sample,Estimated coverage,Status >> estimated_coverage_initial.csv
cat $report >> estimated_coverage_initial.csv
"""
}
/** REPORT_CORRUPT - PLUG-IN
This process will report the corrupted samples and write the results to
'reports/corrupted/corrupted_samples.txt'
*/
process report_corrupt_5_7 {
// This process can only use a single CPU
cpus 1
publishDir 'reports/corrupted_5_7/'
input:
val sample_id from LOG_corrupted_5_7.collect{it[0]}
output:
file 'corrupted_samples.txt'
"""
echo ${sample_id.join(",")} | tr "," "\n" >> corrupted_samples.txt
"""
}
SIDE_phred_5_7.set{ SIDE_phred_5_8 }
SIDE_max_len_5_7.set{ SIDE_max_len_8_14 }
// Check sliding window parameter
if ( params.trimSlidingWindow_5_8.toString().split(":").size() != 2 ){
exit 1, "'trimSlidingWindow_5_8' parameter must contain two values separated by a ':'. Provided value: '${params.trimSlidingWindow_5_8}'"
}
if ( !params.trimLeading_5_8.toString().isNumber() ){
exit 1, "'trimLeading_5_8' parameter must be a number. Provide value: '${params.trimLeading_5_8}'"
}
if ( !params.trimTrailing_5_8.toString().isNumber() ){
exit 1, "'trimTrailing_5_8' parameter must be a number. Provide value: '${params.trimTrailing_5_8}'"
}
if ( !params.trimMinLength_5_8.toString().isNumber() ){
exit 1, "'trimMinLength_5_8' parameter must be a number. Provide value: '${params.trimMinLength_5_8}'"
}
IN_trimmomatic_opts_5_8 = Channel.value([params.trimSlidingWindow_5_8,params.trimLeading_5_8,params.trimTrailing_5_8,params.trimMinLength_5_8])
IN_adapters_5_8 = Channel.value(params.adapters_5_8)
clear = params.clearInput_5_8 ? "true" : "false"
checkpointClear_5_8 = Channel.value(clear)
process fastqc_5_8 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 5_8 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 5_8 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 5_8 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId fastqc_trimmomatic_5_8 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
publishDir "reports/fastqc_5_8/", pattern: "*.html"
input:
set sample_id, file(fastq_pair) from integrity_coverage_out_5_4_dep
val ad from Channel.value('None')
output:
set sample_id, file(fastq_pair), file('pair_1*'), file('pair_2*') into MAIN_fastqc_out_5_8
file "*html"
set sample_id, val("5_8_fastqc"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_fastqc_5_8
set sample_id, val("fastqc_5_8"), val("5_8"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_fastqc_5_8
file ".versions"
script:
template "fastqc.py"
}
/** FASTQC_REPORT - MAIN
This process will parse the result files from a FastQC analyses and output
the optimal_trim information for Trimmomatic
*/
process fastqc_report_5_8 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 5_8 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 5_8 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 5_8 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId fastqc_trimmomatic_5_8 \"$params.platformSpecies\" false"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
// This process can only use a single CPU
cpus 1
publishDir 'reports/fastqc_5_8/run_1/', pattern: '*summary.txt', mode: 'copy'
input:
set sample_id, file(fastq_pair), file(result_p1), file(result_p2) from MAIN_fastqc_out_5_8
val opts from Channel.value("--ignore-tests")
output:
set sample_id, file(fastq_pair), 'optimal_trim', ".status" into _MAIN_fastqc_trim_5_8
file '*_trim_report' into LOG_trim_5_8
file "*_status_report" into LOG_fastqc_report_5_8
file "${sample_id}_*_summary.txt" optional true
set sample_id, val("5_8_fastqc_report"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_fastqc_report_5_8
set sample_id, val("fastqc_report_5_8"), val("5_8"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_fastqc_report_5_8
file ".versions"
script:
template "fastqc_report.py"
}
MAIN_fastqc_trim_5_8 = Channel.create()
_MAIN_fastqc_trim_5_8
.filter{ it[3].text == "pass" }
.map{ [it[0], it[1], file(it[2]).text] }
.into(MAIN_fastqc_trim_5_8)
/** TRIM_REPORT - PLUG-IN
This will collect the optimal trim points assessed by the fastqc_report
process and write the results of all samples in a single csv file
*/
process trim_report_5_8 {
publishDir 'reports/fastqc_5_8/', mode: 'copy'
input:
file trim from LOG_trim_5_8.collect()
output:
file "FastQC_trim_report.csv"
"""
echo Sample,Trim begin, Trim end >> FastQC_trim_report.csv
cat $trim >> FastQC_trim_report.csv
"""
}
process compile_fastqc_status_5_8 {
publishDir 'reports/fastqc_5_8/', mode: 'copy'
input:
file rep from LOG_fastqc_report_5_8.collect()
output:
file 'FastQC_1run_report.csv'
"""
echo Sample, Failed? >> FastQC_1run_report.csv
cat $rep >> FastQC_1run_report.csv
"""
}
/** TRIMMOMATIC - MAIN
This process will execute trimmomatic. Currently, the main channel requires
information on the trim_range and phred score.
*/
process trimmomatic_5_8 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 5_8 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 5_8 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 5_8 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId fastqc_trimmomatic_5_8 \"$params.platformSpecies\" false"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
publishDir "results/trimmomatic_5_8", pattern: "*.gz"
input:
set sample_id, file(fastq_pair), trim_range, phred from MAIN_fastqc_trim_5_8.join(SIDE_phred_5_8)
val opts from IN_trimmomatic_opts_5_8
val ad from IN_adapters_5_8
val clear from checkpointClear_5_8
output:
set sample_id, "${sample_id}_*trim.fastq.gz" into fastqc_trimmomatic_out_5_4
file 'trimmomatic_report.csv'
set sample_id, val("5_8_trimmomatic"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_trimmomatic_5_8
set sample_id, val("trimmomatic_5_8"), val("5_8"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_trimmomatic_5_8
file ".versions"
script:
template "trimmomatic.py"
}
if ( !params.minAssemblyCoverage_3_9.toString().isNumber() ){
if (params.minAssemblyCoverage_3_9.toString() != 'auto'){
exit 1, "'minAssemblyCoverage_3_9' parameter must be a number or 'auto'. Provided value: ${params.minAssemblyCoverage_3_9}"
}
}
if ( !params.AMaxContigs_3_9.toString().isNumber() ){
exit 1, "'AMaxContigs_3_9' parameter must be a number. Provide value: '${params.AMaxContigs_3_9}'"
}
IN_assembly_mapping_opts_3_9 = Channel.value([params.minAssemblyCoverage_3_9,params.AMaxContigs_3_9])
IN_genome_size_3_9 = Channel.value(params.genomeSize_3_9)
process assembly_mapping_3_9 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 3_9 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 3_9 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 3_9 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId assembly_mapping_3_9 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
input:
set sample_id, file(assembly), file(fastq) from skesa_out_3_2.join(_LAST_fastq_3_9)
output:
set sample_id, file(assembly), 'coverages.tsv', 'coverage_per_bp.tsv', 'sorted.bam', 'sorted.bam.bai' into MAIN_am_out_3_9
set sample_id, file("coverage_per_bp.tsv") optional true into SIDE_BpCoverage_3_9
set sample_id, val("3_9_assembly_mapping"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_assembly_mapping_3_9
set sample_id, val("assembly_mapping_3_9"), val("3_9"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_assembly_mapping_3_9
file ".versions"
script:
"""
{
echo [DEBUG] BUILDING BOWTIE INDEX FOR ASSEMBLY: $assembly >> .command.log 2>&1
bowtie2-build --threads ${task.cpus} $assembly genome_index >> .command.log 2>&1
echo [DEBUG] MAPPING READS FROM $fastq >> .command.log 2>&1
bowtie2 -q --very-sensitive-local --threads ${task.cpus} -x genome_index -1 ${fastq[0]} -2 ${fastq[1]} -S mapping.sam >> .command.log 2>&1
echo [DEBUG] CONVERTING AND SORTING SAM TO BAM >> .command.log 2>&1
samtools sort -o sorted.bam -O bam -@ ${task.cpus} mapping.sam && rm *.sam >> .command.log 2>&1
echo [DEBUG] CREATING BAM INDEX >> .command.log 2>&1
samtools index sorted.bam >> .command.log 2>&1
echo [DEBUG] ESTIMATING READ DEPTH >> .command.log 2>&1
parallel -j ${task.cpus} samtools depth -ar {} sorted.bam \\> {}.tab ::: \$(grep ">" $assembly | cut -c 2- | tr " " "_")
# Insert 0 coverage count in empty files. See Issue #2
echo [DEBUG] REMOVING EMPTY FILES >> .command.log 2>&1
find . -size 0 -print0 | xargs -0 -I{} sh -c 'echo -e 0"\t"0"\t"0 > "{}"'
echo [DEBUG] COMPILING COVERAGE REPORT >> .command.log 2>&1
parallel -j ${task.cpus} echo -n {.} '"\t"' '&&' cut -f3 {} '|' paste -sd+ '|' bc >> coverages.tsv ::: *.tab
cat *.tab > coverage_per_bp.tsv
rm *.tab
if [ -f "coverages.tsv" ]
then
echo pass > .status
else
echo fail > .status
fi
echo -n "" > .report.json
echo -n "" > .versions
} || {
echo fail > .status
}
"""
}
/** PROCESS_ASSEMBLY_MAPPING - MAIN
Processes the results from the assembly_mapping process and filters the
assembly contigs based on coverage and length thresholds.
*/
process process_assembly_mapping_3_9 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 3_9 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 3_9 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 3_9 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId assembly_mapping_3_9 \"$params.platformSpecies\" false"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
// This process can only use a single CPU
cpus 1
input:
set sample_id, file(assembly), file(coverage), file(coverage_bp), file(bam_file), file(bam_index) from MAIN_am_out_3_9
val opts from IN_assembly_mapping_opts_3_9
val gsize from IN_genome_size_3_9
output:
set sample_id, '*_filt.fasta', 'filtered.bam', 'filtered.bam.bai' into assembly_mapping_out_3_5_dep
set sample_id, val("3_9_process_am"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_process_am_3_9
set sample_id, val("process_am_3_9"), val("3_9"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_process_am_3_9
file ".versions"
script:
template "process_assembly_mapping.py"
}
SIDE_BpCoverage_3_9.set{ SIDE_BpCoverage_3_10 }
clear = params.clearInput_3_10 ? "true" : "false"
checkpointClear_3_10 = Channel.value(clear)
process pilon_3_10 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 3_10 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 3_10 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 3_10 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId pilon_3_10 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
echo false
publishDir 'results/assembly/pilon_3_10/', mode: 'copy', pattern: "*.fasta"
input:
set sample_id, file(assembly), file(bam_file), file(bam_index) from assembly_mapping_out_3_5_dep
val clear from checkpointClear_3_10
output:
set sample_id, '*_polished.fasta' into _pilon_out_3_5, pilon_report_3_10
set sample_id, val("3_10_pilon"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_pilon_3_10
set sample_id, val("pilon_3_10"), val("3_10"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_pilon_3_10
file ".versions"
script:
"""
{
pilon_mem=${String.valueOf(task.memory).substring(0, String.valueOf(task.memory).length() - 1).replaceAll("\\s", "")}
java -jar -Xms256m -Xmx\${pilon_mem} /NGStools/pilon-1.22.jar --genome $assembly --frags $bam_file --output ${assembly.name.replaceFirst(~/\.[^\.]+$/, '')}_polished --changes --threads $task.cpus >> .command.log 2>&1
echo pass > .status
if [ "$clear" = "true" ];
then
work_regex=".*/work/.{2}/.{30}/.*"
assembly_file=\$(readlink -f \$(pwd)/${assembly})
bam_file=\$(readlink -f \$(pwd)/${bam_file})
if [[ "\$assembly_file" =~ \$work_regex ]]; then
rm \$assembly_file \$bam_file
fi
fi
} || {
echo fail > .status
}
"""
}
process pilon_report_3_10 {
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh"
afterScript "report_POST.sh $params.projectId $params.pipelineId 3_10 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId pilon_3_10 \"$params.platformSpecies\" false"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh"
}
tag { sample_id }
input:
set sample_id, file(assembly), file(coverage_bp) from pilon_report_3_10.join(SIDE_BpCoverage_3_10)
output:
file "*_assembly_report.csv" into pilon_report_out_3_10
set sample_id, val("3_10_pilon_report"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_pilon_report_3_10
set sample_id, val("pilon_report_3_10"), val("3_10"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_pilon_report_3_10
file ".versions"
script:
template "assembly_report.py"
}
process compile_pilon_report_3_10 {
publishDir "reports/assembly/pilon_3_10/", mode: 'copy'
input:
file(report) from pilon_report_out_3_10.collect()
output:
file "pilon_assembly_report.csv"
"""
echo Sample,Number of contigs,Average contig size,N50,Total assembly length,GC content,Missing data > pilon_assembly_report.csv
cat $report >> pilon_assembly_report.csv
"""
}
_pilon_out_3_5.into{ pilon_out_3_5;abricate_in_3_7;chewbbaca_in_3_8 }
IN_adapters_5_11 = Channel.value(params.adapters_5_11)
process fastqc2_5_11 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 5_11 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 5_11 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 5_11 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId fastqc_5_11 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
publishDir "reports/fastqc_5_11/", pattern: "*.html"
input:
set sample_id, file(fastq_pair) from fastqc_trimmomatic_out_5_4
val ad from IN_adapters_5_11
output:
set sample_id, file(fastq_pair), file('pair_1*'), file('pair_2*') into MAIN_fastqc_out_5_11
file "*html"
set sample_id, val("5_11_fastqc2"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_fastqc2_5_11
set sample_id, val("fastqc2_5_11"), val("5_11"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_fastqc2_5_11
file ".versions"
script:
template "fastqc.py"
}
process fastqc2_report_5_11 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 5_11 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 5_11 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 5_11 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId fastqc_5_11 \"$params.platformSpecies\" false"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
// This process can only use a single CPU
cpus 1
publishDir 'reports/fastqc_5_11/run_2/', pattern: '*summary.txt', mode: 'copy'
input:
set sample_id, file(fastq_pair), file(result_p1), file(result_p2) from MAIN_fastqc_out_5_11
val opts from Channel.value("")
output:
set sample_id, file(fastq_pair), '.status' into MAIN_fastqc_report_5_11
file "*_status_report" into LOG_fastqc_report_5_11
file "${sample_id}_*_summary.txt" optional true
set sample_id, val("5_11_fastqc2_report"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_fastqc2_report_5_11
set sample_id, val("fastqc2_report_5_11"), val("5_11"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_fastqc2_report_5_11
file ".versions"
script:
template "fastqc_report.py"
}
process compile_fastqc_status2_5_11 {
publishDir 'reports/fastqc_5_11/', mode: 'copy'
input:
file rep from LOG_fastqc_report_5_11.collect()
output:
file 'FastQC_2run_report.csv'
"""
echo Sample, Failed? >> FastQC_2run_report.csv
cat $rep >> FastQC_2run_report.csv
"""
}
_fastqc_out_5_6 = Channel.create()
MAIN_fastqc_report_5_11
.filter{ it[2].text == "pass" }
.map{ [it[0], it[1]] }
.into(_fastqc_out_5_6)
_fastqc_out_5_6.into{ fastqc_out_5_6;spades_in_5_9;skesa_in_5_10;_LAST_fastq_8_16;_LAST_fastq_9_18 }
if ( params.abricateDataDir_6_12 ){
if ( !file(params.abricateDataDir_6_12).exists() ){
exit 1, "'abricateDataDir_6_12' data directory was not found: '${params.abricateDatabases_6_12}'"
}
dataDirOpt = "--datadir ${params.abricateDataDir_6_12}"
} else {
dataDirOpt = ""
}
if ( !params.abricateMinId_6_12.toString().isNumber() ){
exit 1, "'abricateMinId_6_12' parameter must be a number. Provide value: '${params.abricateMinId_6_12}'"
}
if ( !params.abricateMinCov_6_12.toString().isNumber() ){
exit 1, "'abricateMinCov_6_12' parameter must be a number. Provide value: '${params.abricateMinCov_6_12}'"
}
process abricate_6_12 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 6_12 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 6_12 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 6_12 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId abricate_6_12 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { "${sample_id} ${db}" }
publishDir "results/annotation/abricate_6_12/${sample_id}"
input:
set sample_id, file(assembly) from abricate_in_3_7
each db from params.abricateDatabases_6_12
val min_id from Channel.value(params.abricateMinId_6_12)
val min_cov from Channel.value(params.abricateMinCov_6_12)
output:
file '*.tsv' into abricate_out_6_12
set sample_id, val("6_12_abricate_$db"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_abricate_6_12
set sample_id, val("abricate_6_12_$db"), val("6_12"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_abricate_6_12
file ".versions"
script:
"""
{
# Run abricate
abricate $dataDirOpt --minid $min_id --mincov $min_cov --db $db $assembly > ${sample_id}_abr_${db}.tsv
echo pass > .status
} || {
echo fail > .status
}
"""
}
process process_abricate_6_12 {
tag "process_abricate_6_12"
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh"
afterScript "report_POST.sh $params.projectId $params.pipelineId 6_12 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId abricate_6_12 \"$params.platformSpecies\" false"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh"
}
input:
file abricate_file from abricate_out_6_12.collect()
output:
set val('process_abricate'), val("6_12_process_abricate"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_process_abricate_6_12
set val('process_abricate'), val("process_abricate_6_12"), val("6_12"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_process_abricate_6_12
file ".versions"
script:
template "process_abricate.py"
}
if ( !params.schemaPath_7_13 ){
exit 1, "'schemaPath_7_13' parameter missing"
}
if ( params.chewbbacaTraining_7_13){
if (!file(params.chewbbacaTraining_7_13).exists()) {
exit 1, "'chewbbacaTraining_7_13' file was not found: '${params.chewbbacaTraining_7_13}'"
}
}
if ( params.schemaSelectedLoci_7_13){
if (!file(params.schemaSelectedLoci_7_13).exists()) {
exit 1, "'schemaSelectedLoci_7_13' file was not found: '${params.schemaSelectedLoci_7_13}'"
}
}
if ( params.schemaCore_7_13){
if (!file(params.schemaCore_7_13).exists()) {
exit 1, "'schemaCore_7_13' file was not found: '${params.schemaCore_7_13}'"
}
}
IN_schema_7_13 = Channel.fromPath(params.schemaPath_7_13)
if (params.chewbbacaJson_7_13 == true){
jsonOpt = "--json"
} else {
jsonOpt = ""
}
if (params.chewbbacaTraining_7_13){
training = "--ptf ${params.chewbbacaTraining_7_13}"
} else {
training = ""
}
// If chewbbaca is executed in batch mode, wait for all assembly files
// to be collected on the input channel, and only then execute chewbbaca
// providing all samples simultaneously
if (params.chewbbacaBatch_7_13) {
process chewbbaca_batch_7_13 {
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 7_13 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 7_13 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 7_13 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId chewbbaca_7_13 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
maxForks 1
scratch false
if (params.chewbbacaQueue_7_13 != null) {
queue "${params.chewbbacaQueue_7_13}"
}
publishDir "results/chewbbaca_alleleCall_7_13/", mode: "copy"
input:
file assembly from chewbbaca_in_3_8.map{ it[1] }.collect()
each file(schema) from IN_schema_7_13
output:
file 'chew_results*'
file 'cgMLST.tsv' optional true into chewbbacaProfile_7_13
set val('single'), val("7_13_chewbbaca"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_chewbbaca_7_13
set val('single'), val("chewbbaca_7_13"), val("7_13"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_chewbbaca_7_13
file ".versions"
script:
"""
{
set -x
if [ -d "$schema/temp" ];
then
rm -r $schema/temp
fi
if [ "$params.schemaSelectedLoci_7_13" = "null" ];
then
inputGenomes=$schema
else
inputGenomes=${params.schemaSelectedLoci_7_13}
fi
echo $assembly | tr " " "\n" >> input_file.txt
chewBBACA.py AlleleCall -i input_file.txt -g \$inputGenomes -o chew_results $jsonOpt --cpu $task.cpus $training
if [ "$jsonOpt" = "--json" ]; then
merge_json.py ${params.schemaCore_7_13} chew_results/*/results*
else
cp chew_results*/*/results_alleles.tsv cgMLST.tsv
fi
} || {
echo fail > .status
}
"""
}
} else {
process chewbbaca_7_13 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 7_13 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 7_13 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 7_13 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId chewbbaca_7_13 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
maxForks 1
tag { sample_id }
scratch true
if (params.chewbbacaQueue_7_13 != null) {
queue "${params.chewbbacaQueue_7_13}"
}
publishDir "results/chewbbaca_alleleCall_7_13/", mode: "copy"
input:
set sample_id, file(assembly) from chewbbaca_in_3_8
each file(schema) from IN_schema_7_13
output:
file 'chew_results*'
file '*_cgMLST.tsv' optional true into chewbbacaProfile_7_13
set sample_id, val("7_13_chewbbaca"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_chewbbaca_7_13
set sample_id, val("chewbbaca_7_13"), val("7_13"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_chewbbaca_7_13
file ".versions"
script:
"""
{
set -x
if [ -d "$schema/temp" ];
then
rm -r $schema/temp
fi
if [ "$params.schemaSelectedLoci_7_13" = "null" ];
then
inputGenomes=$schema
else
inputGenomes=${params.schemaSelectedLoci_7_13}
fi
echo $assembly >> input_file.txt
chewBBACA.py AlleleCall -i input_file.txt -g \$inputGenomes -o chew_results_${sample_id} $jsonOpt --cpu $task.cpus $training --fc
if [ "$jsonOpt" = "--json" ]; then
merge_json.py ${params.schemaCore_7_13} chew_results_*/*/results* ${sample_id}
else
mv chew_results_*/*/results_alleles.tsv ${sample_id}_cgMLST.tsv
fi
} || {
echo fail > .status
}
"""
}
}
process chewbbacaExtractMLST_7_13 {
publishDir "results/chewbbaca_7_13/", mode: "copy", overwrite: true
input:
file profiles from chewbbacaProfile_7_13.collect()
output:
file "results/cgMLST.tsv"
"""
head -n1 ${profiles[0]} > chewbbaca_profiles.tsv
awk 'FNR == 2' $profiles >> chewbbaca_profiles.tsv
chewBBACA.py ExtractCgMLST -i chewbbaca_profiles.tsv -o results -p $params.chewbbacaProfilePercentage_7_13
"""
}
if ( !params.spadesMinCoverage_8_14.toString().isNumber() ){
exit 1, "'spadesMinCoverage_8_14' parameter must be a number. Provided value: '${params.spadesMinCoverage_8_14}'"
}
if ( !params.spadesMinKmerCoverage_8_14.toString().isNumber()){
exit 1, "'spadesMinKmerCoverage_8_14' parameter must be a number. Provided value: '${params.spadesMinKmerCoverage_8_14}'"
}
IN_spades_opts_8_14 = Channel.value(
[params.spadesMinCoverage_8_14,
params.spadesMinKmerCoverage_8_14
])
if ( params.spadesKmers_8_14.toString().split(" ").size() <= 1 ){
if (params.spadesKmers_8_14.toString() != 'auto'){
exit 1, "'spadesKmers_8_14' parameter must be a sequence of space separated numbers or 'auto'. Provided value: ${params.spadesKmers_8_14}"
}
}
IN_spades_kmers_8_14 = Channel.value(params.spadesKmers_8_14)
clear = params.clearInput_8_14 ? "true" : "false"
disable_rr = params.disableRR_8_14 ? "true" : "false"
checkpointClear_8_14 = Channel.value(clear)
disableRR_8_14 = Channel.value(disable_rr)
process spades_8_14 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 8_14 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 8_14 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 8_14 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId spades_8_14 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
publishDir 'results/assembly/spades_8_14/', pattern: '*_spades*.fasta', mode: 'copy'
publishDir "results/assembly/spades_8_14/$sample_id", pattern: "*.gfa", mode: "copy"
publishDir "results/assembly/spades_8_14/$sample_id", pattern: "*.fastg", mode: "copy"
input:
set sample_id, file(fastq_pair), max_len from spades_in_5_9.join(SIDE_max_len_8_14)
val opts from IN_spades_opts_8_14
val kmers from IN_spades_kmers_8_14
val clear from checkpointClear_8_14
val disable_rr from disableRR_8_14
output:
set sample_id, file('*_spades*.fasta') into spades_out_8_9
file "*.fastg" optional true
file "*.gfa" into gfa1_8_14
set sample_id, val("8_14_spades"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_spades_8_14
set sample_id, val("spades_8_14"), val("8_14"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_spades_8_14
file ".versions"
script:
template "spades.py"
}
clear = params.clearInput_9_15 ? "true" : "false"
checkpointClear_9_15 = Channel.value(clear)
process skesa_9_15 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 9_15 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 9_15 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 9_15 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId skesa_9_15 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
publishDir 'results/assembly/skesa_9_15', pattern: '*skesa*.fasta', mode: 'copy'
input:
set sample_id, file(fastq_pair) from skesa_in_5_10
val clear from checkpointClear_9_15
output:
set sample_id, file('*.fasta') into skesa_out_9_10
set sample_id, val("9_15_skesa"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_skesa_9_15
set sample_id, val("skesa_9_15"), val("9_15"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_skesa_9_15
file ".versions"
script:
template "skesa.py"
}
if ( !params.minAssemblyCoverage_8_16.toString().isNumber() ){
if (params.minAssemblyCoverage_8_16.toString() != 'auto'){
exit 1, "'minAssemblyCoverage_8_16' parameter must be a number or 'auto'. Provided value: ${params.minAssemblyCoverage_8_16}"
}
}
if ( !params.AMaxContigs_8_16.toString().isNumber() ){
exit 1, "'AMaxContigs_8_16' parameter must be a number. Provide value: '${params.AMaxContigs_8_16}'"
}
IN_assembly_mapping_opts_8_16 = Channel.value([params.minAssemblyCoverage_8_16,params.AMaxContigs_8_16])
IN_genome_size_8_16 = Channel.value(params.genomeSize_8_16)
process assembly_mapping_8_16 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 8_16 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 8_16 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 8_16 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId assembly_mapping_8_16 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
input:
set sample_id, file(assembly), file(fastq) from spades_out_8_9.join(_LAST_fastq_8_16)
output:
set sample_id, file(assembly), 'coverages.tsv', 'coverage_per_bp.tsv', 'sorted.bam', 'sorted.bam.bai' into MAIN_am_out_8_16
set sample_id, file("coverage_per_bp.tsv") optional true into SIDE_BpCoverage_8_16
set sample_id, val("8_16_assembly_mapping"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_assembly_mapping_8_16
set sample_id, val("assembly_mapping_8_16"), val("8_16"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_assembly_mapping_8_16
file ".versions"
script:
"""
{
echo [DEBUG] BUILDING BOWTIE INDEX FOR ASSEMBLY: $assembly >> .command.log 2>&1
bowtie2-build --threads ${task.cpus} $assembly genome_index >> .command.log 2>&1
echo [DEBUG] MAPPING READS FROM $fastq >> .command.log 2>&1
bowtie2 -q --very-sensitive-local --threads ${task.cpus} -x genome_index -1 ${fastq[0]} -2 ${fastq[1]} -S mapping.sam >> .command.log 2>&1
echo [DEBUG] CONVERTING AND SORTING SAM TO BAM >> .command.log 2>&1
samtools sort -o sorted.bam -O bam -@ ${task.cpus} mapping.sam && rm *.sam >> .command.log 2>&1
echo [DEBUG] CREATING BAM INDEX >> .command.log 2>&1
samtools index sorted.bam >> .command.log 2>&1
echo [DEBUG] ESTIMATING READ DEPTH >> .command.log 2>&1
parallel -j ${task.cpus} samtools depth -ar {} sorted.bam \\> {}.tab ::: \$(grep ">" $assembly | cut -c 2- | tr " " "_")
# Insert 0 coverage count in empty files. See Issue #2
echo [DEBUG] REMOVING EMPTY FILES >> .command.log 2>&1
find . -size 0 -print0 | xargs -0 -I{} sh -c 'echo -e 0"\t"0"\t"0 > "{}"'
echo [DEBUG] COMPILING COVERAGE REPORT >> .command.log 2>&1
parallel -j ${task.cpus} echo -n {.} '"\t"' '&&' cut -f3 {} '|' paste -sd+ '|' bc >> coverages.tsv ::: *.tab
cat *.tab > coverage_per_bp.tsv
rm *.tab
if [ -f "coverages.tsv" ]
then
echo pass > .status
else
echo fail > .status
fi
echo -n "" > .report.json
echo -n "" > .versions
} || {
echo fail > .status
}
"""
}
/** PROCESS_ASSEMBLY_MAPPING - MAIN
Processes the results from the assembly_mapping process and filters the
assembly contigs based on coverage and length thresholds.
*/
process process_assembly_mapping_8_16 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 8_16 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 8_16 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 8_16 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId assembly_mapping_8_16 \"$params.platformSpecies\" false"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
// This process can only use a single CPU
cpus 1
input:
set sample_id, file(assembly), file(coverage), file(coverage_bp), file(bam_file), file(bam_index) from MAIN_am_out_8_16
val opts from IN_assembly_mapping_opts_8_16
val gsize from IN_genome_size_8_16
output:
set sample_id, '*_filt.fasta', 'filtered.bam', 'filtered.bam.bai' into assembly_mapping_out_8_11_dep
set sample_id, val("8_16_process_am"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_process_am_8_16
set sample_id, val("process_am_8_16"), val("8_16"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_process_am_8_16
file ".versions"
script:
template "process_assembly_mapping.py"
}
SIDE_BpCoverage_8_16.set{ SIDE_BpCoverage_8_17 }
clear = params.clearInput_8_17 ? "true" : "false"
checkpointClear_8_17 = Channel.value(clear)
process pilon_8_17 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 8_17 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 8_17 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 8_17 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId pilon_8_17 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
echo false
publishDir 'results/assembly/pilon_8_17/', mode: 'copy', pattern: "*.fasta"
input:
set sample_id, file(assembly), file(bam_file), file(bam_index) from assembly_mapping_out_8_11_dep
val clear from checkpointClear_8_17
output:
set sample_id, '*_polished.fasta' into _pilon_out_8_11, pilon_report_8_17
set sample_id, val("8_17_pilon"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_pilon_8_17
set sample_id, val("pilon_8_17"), val("8_17"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_pilon_8_17
file ".versions"
script:
"""
{
pilon_mem=${String.valueOf(task.memory).substring(0, String.valueOf(task.memory).length() - 1).replaceAll("\\s", "")}
java -jar -Xms256m -Xmx\${pilon_mem} /NGStools/pilon-1.22.jar --genome $assembly --frags $bam_file --output ${assembly.name.replaceFirst(~/\.[^\.]+$/, '')}_polished --changes --threads $task.cpus >> .command.log 2>&1
echo pass > .status
if [ "$clear" = "true" ];
then
work_regex=".*/work/.{2}/.{30}/.*"
assembly_file=\$(readlink -f \$(pwd)/${assembly})
bam_file=\$(readlink -f \$(pwd)/${bam_file})
if [[ "\$assembly_file" =~ \$work_regex ]]; then
rm \$assembly_file \$bam_file
fi
fi
} || {
echo fail > .status
}
"""
}
process pilon_report_8_17 {
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh"
afterScript "report_POST.sh $params.projectId $params.pipelineId 8_17 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId pilon_8_17 \"$params.platformSpecies\" false"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh"
}
tag { sample_id }
input:
set sample_id, file(assembly), file(coverage_bp) from pilon_report_8_17.join(SIDE_BpCoverage_8_17)
output:
file "*_assembly_report.csv" into pilon_report_out_8_17
set sample_id, val("8_17_pilon_report"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_pilon_report_8_17
set sample_id, val("pilon_report_8_17"), val("8_17"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_pilon_report_8_17
file ".versions"
script:
template "assembly_report.py"
}
process compile_pilon_report_8_17 {
publishDir "reports/assembly/pilon_8_17/", mode: 'copy'
input:
file(report) from pilon_report_out_8_17.collect()
output:
file "pilon_assembly_report.csv"
"""
echo Sample,Number of contigs,Average contig size,N50,Total assembly length,GC content,Missing data > pilon_assembly_report.csv
cat $report >> pilon_assembly_report.csv
"""
}
_pilon_out_8_11.into{ pilon_out_8_11;mlst_in_8_14;prokka_in_8_15;chewbbaca_in_8_16 }
if ( !params.minAssemblyCoverage_9_18.toString().isNumber() ){
if (params.minAssemblyCoverage_9_18.toString() != 'auto'){
exit 1, "'minAssemblyCoverage_9_18' parameter must be a number or 'auto'. Provided value: ${params.minAssemblyCoverage_9_18}"
}
}
if ( !params.AMaxContigs_9_18.toString().isNumber() ){
exit 1, "'AMaxContigs_9_18' parameter must be a number. Provide value: '${params.AMaxContigs_9_18}'"
}
IN_assembly_mapping_opts_9_18 = Channel.value([params.minAssemblyCoverage_9_18,params.AMaxContigs_9_18])
IN_genome_size_9_18 = Channel.value(params.genomeSize_9_18)
process assembly_mapping_9_18 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 9_18 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 9_18 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 9_18 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId assembly_mapping_9_18 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
input:
set sample_id, file(assembly), file(fastq) from skesa_out_9_10.join(_LAST_fastq_9_18)
output:
set sample_id, file(assembly), 'coverages.tsv', 'coverage_per_bp.tsv', 'sorted.bam', 'sorted.bam.bai' into MAIN_am_out_9_18
set sample_id, file("coverage_per_bp.tsv") optional true into SIDE_BpCoverage_9_18
set sample_id, val("9_18_assembly_mapping"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_assembly_mapping_9_18
set sample_id, val("assembly_mapping_9_18"), val("9_18"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_assembly_mapping_9_18
file ".versions"
script:
"""
{
echo [DEBUG] BUILDING BOWTIE INDEX FOR ASSEMBLY: $assembly >> .command.log 2>&1
bowtie2-build --threads ${task.cpus} $assembly genome_index >> .command.log 2>&1
echo [DEBUG] MAPPING READS FROM $fastq >> .command.log 2>&1
bowtie2 -q --very-sensitive-local --threads ${task.cpus} -x genome_index -1 ${fastq[0]} -2 ${fastq[1]} -S mapping.sam >> .command.log 2>&1
echo [DEBUG] CONVERTING AND SORTING SAM TO BAM >> .command.log 2>&1
samtools sort -o sorted.bam -O bam -@ ${task.cpus} mapping.sam && rm *.sam >> .command.log 2>&1
echo [DEBUG] CREATING BAM INDEX >> .command.log 2>&1
samtools index sorted.bam >> .command.log 2>&1
echo [DEBUG] ESTIMATING READ DEPTH >> .command.log 2>&1
parallel -j ${task.cpus} samtools depth -ar {} sorted.bam \\> {}.tab ::: \$(grep ">" $assembly | cut -c 2- | tr " " "_")
# Insert 0 coverage count in empty files. See Issue #2
echo [DEBUG] REMOVING EMPTY FILES >> .command.log 2>&1
find . -size 0 -print0 | xargs -0 -I{} sh -c 'echo -e 0"\t"0"\t"0 > "{}"'
echo [DEBUG] COMPILING COVERAGE REPORT >> .command.log 2>&1
parallel -j ${task.cpus} echo -n {.} '"\t"' '&&' cut -f3 {} '|' paste -sd+ '|' bc >> coverages.tsv ::: *.tab
cat *.tab > coverage_per_bp.tsv
rm *.tab
if [ -f "coverages.tsv" ]
then
echo pass > .status
else
echo fail > .status
fi
echo -n "" > .report.json
echo -n "" > .versions
} || {
echo fail > .status
}
"""
}
/** PROCESS_ASSEMBLY_MAPPING - MAIN
Processes the results from the assembly_mapping process and filters the
assembly contigs based on coverage and length thresholds.
*/
process process_assembly_mapping_9_18 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 9_18 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 9_18 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 9_18 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId assembly_mapping_9_18 \"$params.platformSpecies\" false"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
// This process can only use a single CPU
cpus 1
input:
set sample_id, file(assembly), file(coverage), file(coverage_bp), file(bam_file), file(bam_index) from MAIN_am_out_9_18
val opts from IN_assembly_mapping_opts_9_18
val gsize from IN_genome_size_9_18
output:
set sample_id, '*_filt.fasta', 'filtered.bam', 'filtered.bam.bai' into assembly_mapping_out_9_12_dep
set sample_id, val("9_18_process_am"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_process_am_9_18
set sample_id, val("process_am_9_18"), val("9_18"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_process_am_9_18
file ".versions"
script:
template "process_assembly_mapping.py"
}
SIDE_BpCoverage_9_18.set{ SIDE_BpCoverage_9_19 }
clear = params.clearInput_9_19 ? "true" : "false"
checkpointClear_9_19 = Channel.value(clear)
process pilon_9_19 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 9_19 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 9_19 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 9_19 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId pilon_9_19 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
echo false
publishDir 'results/assembly/pilon_9_19/', mode: 'copy', pattern: "*.fasta"
input:
set sample_id, file(assembly), file(bam_file), file(bam_index) from assembly_mapping_out_9_12_dep
val clear from checkpointClear_9_19
output:
set sample_id, '*_polished.fasta' into pilon_out_9_12, pilon_report_9_19
set sample_id, val("9_19_pilon"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_pilon_9_19
set sample_id, val("pilon_9_19"), val("9_19"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_pilon_9_19
file ".versions"
script:
"""
{
pilon_mem=${String.valueOf(task.memory).substring(0, String.valueOf(task.memory).length() - 1).replaceAll("\\s", "")}
java -jar -Xms256m -Xmx\${pilon_mem} /NGStools/pilon-1.22.jar --genome $assembly --frags $bam_file --output ${assembly.name.replaceFirst(~/\.[^\.]+$/, '')}_polished --changes --threads $task.cpus >> .command.log 2>&1
echo pass > .status
if [ "$clear" = "true" ];
then
work_regex=".*/work/.{2}/.{30}/.*"
assembly_file=\$(readlink -f \$(pwd)/${assembly})
bam_file=\$(readlink -f \$(pwd)/${bam_file})
if [[ "\$assembly_file" =~ \$work_regex ]]; then
rm \$assembly_file \$bam_file
fi
fi
} || {
echo fail > .status
}
"""
}
process pilon_report_9_19 {
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh"
afterScript "report_POST.sh $params.projectId $params.pipelineId 9_19 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId pilon_9_19 \"$params.platformSpecies\" false"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh"
}
tag { sample_id }
input:
set sample_id, file(assembly), file(coverage_bp) from pilon_report_9_19.join(SIDE_BpCoverage_9_19)
output:
file "*_assembly_report.csv" into pilon_report_out_9_19
set sample_id, val("9_19_pilon_report"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_pilon_report_9_19
set sample_id, val("pilon_report_9_19"), val("9_19"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_pilon_report_9_19
file ".versions"
script:
template "assembly_report.py"
}
process compile_pilon_report_9_19 {
publishDir "reports/assembly/pilon_9_19/", mode: 'copy'
input:
file(report) from pilon_report_out_9_19.collect()
output:
file "pilon_assembly_report.csv"
"""
echo Sample,Number of contigs,Average contig size,N50,Total assembly length,GC content,Missing data > pilon_assembly_report.csv
cat $report >> pilon_assembly_report.csv
"""
}
if ( params.abricateDataDir_9_20 ){
if ( !file(params.abricateDataDir_9_20).exists() ){
exit 1, "'abricateDataDir_9_20' data directory was not found: '${params.abricateDatabases_9_20}'"
}
dataDirOpt = "--datadir ${params.abricateDataDir_9_20}"
} else {
dataDirOpt = ""
}
if ( !params.abricateMinId_9_20.toString().isNumber() ){
exit 1, "'abricateMinId_9_20' parameter must be a number. Provide value: '${params.abricateMinId_9_20}'"
}
if ( !params.abricateMinCov_9_20.toString().isNumber() ){
exit 1, "'abricateMinCov_9_20' parameter must be a number. Provide value: '${params.abricateMinCov_9_20}'"
}
process abricate_9_20 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 9_20 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 9_20 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 9_20 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId abricate_9_20 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { "${sample_id} ${db}" }
publishDir "results/annotation/abricate_9_20/${sample_id}"
input:
set sample_id, file(assembly) from pilon_out_9_12
each db from params.abricateDatabases_9_20
val min_id from Channel.value(params.abricateMinId_9_20)
val min_cov from Channel.value(params.abricateMinCov_9_20)
output:
file '*.tsv' into abricate_out_9_20
set sample_id, val("9_20_abricate_$db"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_abricate_9_20
set sample_id, val("abricate_9_20_$db"), val("9_20"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_abricate_9_20
file ".versions"
script:
"""
{
# Run abricate
abricate $dataDirOpt --minid $min_id --mincov $min_cov --db $db $assembly > ${sample_id}_abr_${db}.tsv
echo pass > .status
} || {
echo fail > .status
}
"""
}
process process_abricate_9_20 {
tag "process_abricate_9_20"
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh"
afterScript "report_POST.sh $params.projectId $params.pipelineId 9_20 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId abricate_9_20 \"$params.platformSpecies\" false"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh"
}
input:
file abricate_file from abricate_out_9_20.collect()
output:
set val('process_abricate'), val("9_20_process_abricate"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_process_abricate_9_20
set val('process_abricate'), val("process_abricate_9_20"), val("9_20"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_process_abricate_9_20
file ".versions"
script:
template "process_abricate.py"
}
process mlst_10_21 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 10_21 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 10_21 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 10_21 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId mlst_10_21 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
// This process can only use a single CPU
cpus 1
input:
set sample_id, file(assembly) from mlst_in_8_14
output:
file '*.mlst.txt' into LOG_mlst_10_21
set sample_id, file(assembly), file(".status") into MAIN_mlst_out_10_21
set sample_id, val("10_21_mlst"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_mlst_10_21
set sample_id, val("mlst_10_21"), val("10_21"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_mlst_10_21
file ".versions"
script:
"""
{
expectedSpecies=${params.mlstSpecies_10_21}
mlst $assembly >> ${sample_id}.mlst.txt
mlstSpecies=\$(cat *.mlst.txt | cut -f2)
json_str="{'expectedSpecies':\'\$expectedSpecies\',\
'species':'\$mlstSpecies',\
'st':'\$(cat *.mlst.txt | cut -f3)',\
'tableRow':[{'sample':'${sample_id}','data':[\
{'header':'MLST species','value':'\$mlstSpecies','table':'typing'},\
{'header':'MLST ST','value':'\$(cat *.mlst.txt | cut -f3)','table':'typing'}]}]}"
echo \$json_str > .report.json
if [ ! \$mlstSpecies = \$expectedSpecies ];
then
printf fail > .status
else
printf pass > .status
fi
} || {
printf fail > .status
}
"""
}
process compile_mlst_10_21 {
publishDir "results/annotation/mlst_10_21/"
input:
file res from LOG_mlst_10_21.collect()
output:
file "mlst_report.tsv"
script:
"""
cat $res >> mlst_report.tsv
"""
}
mlst_out_10_14 = Channel.create()
MAIN_mlst_out_10_21
.filter{ it[2].text != "fail" }
.map{ [it[0], it[1]] }
.set{ mlst_out_10_14 }
IN_centre_11_22 = Channel.value(params.centre_11_22)
IN_kingdom_11_22 = Channel.value(params.kingdom_11_22)
// check if genus is provided or not
genusVar = (params.genus_11_22 == false) ? "" : "--usegenus --genus ${params.genus_11_22} "
process prokka_11_22 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 11_22 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 11_22 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 11_22 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId prokka_11_22 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
tag { sample_id }
publishDir "results/annotation/prokka_11_22/${sample_id}"
input:
set sample_id, file(assembly) from prokka_in_8_15
val centre from IN_centre_11_22
val kingdom from IN_kingdom_11_22
output:
file "${sample_id}/*"
set sample_id, val("11_22_prokka"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_prokka_11_22
set sample_id, val("prokka_11_22"), val("11_22"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_prokka_11_22
file ".versions"
script:
"""
{
prokka --outdir $sample_id --cpus $task.cpus --centre ${centre} \
--compliant --kingdom ${kingdom} ${genusVar} --increment 10 $assembly
echo pass > .status
} || {
echo fail > .status
}
"""
}
if ( !params.schemaPath_12_23 ){
exit 1, "'schemaPath_12_23' parameter missing"
}
if ( params.chewbbacaTraining_12_23){
if (!file(params.chewbbacaTraining_12_23).exists()) {
exit 1, "'chewbbacaTraining_12_23' file was not found: '${params.chewbbacaTraining_12_23}'"
}
}
if ( params.schemaSelectedLoci_12_23){
if (!file(params.schemaSelectedLoci_12_23).exists()) {
exit 1, "'schemaSelectedLoci_12_23' file was not found: '${params.schemaSelectedLoci_12_23}'"
}
}
if ( params.schemaCore_12_23){
if (!file(params.schemaCore_12_23).exists()) {
exit 1, "'schemaCore_12_23' file was not found: '${params.schemaCore_12_23}'"
}
}
IN_schema_12_23 = Channel.fromPath(params.schemaPath_12_23)
if (params.chewbbacaJson_12_23 == true){
jsonOpt = "--json"
} else {
jsonOpt = ""
}
if (params.chewbbacaTraining_12_23){
training = "--ptf ${params.chewbbacaTraining_12_23}"
} else {
training = ""
}
// If chewbbaca is executed in batch mode, wait for all assembly files
// to be collected on the input channel, and only then execute chewbbaca
// providing all samples simultaneously
if (params.chewbbacaBatch_12_23) {
process chewbbaca_batch_12_23 {
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 12_23 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 12_23 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 12_23 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId chewbbaca_12_23 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
maxForks 1
scratch false
if (params.chewbbacaQueue_12_23 != null) {
queue "${params.chewbbacaQueue_12_23}"
}
publishDir "results/chewbbaca_alleleCall_12_23/", mode: "copy"
input:
file assembly from chewbbaca_in_8_16.map{ it[1] }.collect()
each file(schema) from IN_schema_12_23
output:
file 'chew_results*'
file 'cgMLST.tsv' optional true into chewbbacaProfile_12_23
set val('single'), val("12_23_chewbbaca"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_chewbbaca_12_23
set val('single'), val("chewbbaca_12_23"), val("12_23"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_chewbbaca_12_23
file ".versions"
script:
"""
{
set -x
if [ -d "$schema/temp" ];
then
rm -r $schema/temp
fi
if [ "$params.schemaSelectedLoci_12_23" = "null" ];
then
inputGenomes=$schema
else
inputGenomes=${params.schemaSelectedLoci_12_23}
fi
echo $assembly | tr " " "\n" >> input_file.txt
chewBBACA.py AlleleCall -i input_file.txt -g \$inputGenomes -o chew_results $jsonOpt --cpu $task.cpus $training
if [ "$jsonOpt" = "--json" ]; then
merge_json.py ${params.schemaCore_12_23} chew_results/*/results*
else
cp chew_results*/*/results_alleles.tsv cgMLST.tsv
fi
} || {
echo fail > .status
}
"""
}
} else {
process chewbbaca_12_23 {
// Send POST request to platform
if ( params.platformHTTP != null ) {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH; set_dotfiles.sh; startup_POST.sh $params.projectId $params.pipelineId 12_23 $params.platformHTTP"
afterScript "final_POST.sh $params.projectId $params.pipelineId 12_23 $params.platformHTTP; report_POST.sh $params.projectId $params.pipelineId 12_23 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId chewbbaca_12_23 \"$params.platformSpecies\" true"
} else {
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; set_dotfiles.sh"
}
maxForks 1
tag { sample_id }
scratch true
if (params.chewbbacaQueue_12_23 != null) {
queue "${params.chewbbacaQueue_12_23}"
}
publishDir "results/chewbbaca_alleleCall_12_23/", mode: "copy"
input:
set sample_id, file(assembly) from chewbbaca_in_8_16
each file(schema) from IN_schema_12_23
output:
file 'chew_results*'
file '*_cgMLST.tsv' optional true into chewbbacaProfile_12_23
set sample_id, val("12_23_chewbbaca"), file(".status"), file(".warning"), file(".fail"), file(".command.log") into STATUS_chewbbaca_12_23
set sample_id, val("chewbbaca_12_23"), val("12_23"), file(".report.json"), file(".versions"), file(".command.trace") into REPORT_chewbbaca_12_23
file ".versions"
script:
"""
{
set -x
if [ -d "$schema/temp" ];
then
rm -r $schema/temp
fi
if [ "$params.schemaSelectedLoci_12_23" = "null" ];
then
inputGenomes=$schema
else
inputGenomes=${params.schemaSelectedLoci_12_23}
fi
echo $assembly >> input_file.txt
chewBBACA.py AlleleCall -i input_file.txt -g \$inputGenomes -o chew_results_${sample_id} $jsonOpt --cpu $task.cpus $training --fc
if [ "$jsonOpt" = "--json" ]; then
merge_json.py ${params.schemaCore_12_23} chew_results_*/*/results* ${sample_id}
else
mv chew_results_*/*/results_alleles.tsv ${sample_id}_cgMLST.tsv
fi
} || {
echo fail > .status
}
"""
}
}
process chewbbacaExtractMLST_12_23 {
publishDir "results/chewbbaca_12_23/", mode: "copy", overwrite: true
input:
file profiles from chewbbacaProfile_12_23.collect()
output:
file "results/cgMLST.tsv"
"""
head -n1 ${profiles[0]} > chewbbaca_profiles.tsv
awk 'FNR == 2' $profiles >> chewbbaca_profiles.tsv
chewBBACA.py ExtractCgMLST -i chewbbaca_profiles.tsv -o results -p $params.chewbbacaProfilePercentage_12_23
"""
}
/** STATUS
Reports the status of a sample in any given process.
*/
process status {
tag { sample_id }
publishDir "pipeline_status/$task_name"
input:
set sample_id, task_name, status, warning, fail, file(log) from STATUS_reads_download_1_1.mix(STATUS_integrity_coverage_2_2,STATUS_spades_2_3,STATUS_skesa_3_4,STATUS_integrity_coverage_4_5,STATUS_megahit_4_6,STATUS_megahit_fastg_4_6,STATUS_integrity_coverage_5_7,STATUS_fastqc_5_8,STATUS_fastqc_report_5_8,STATUS_trimmomatic_5_8,STATUS_assembly_mapping_3_9,STATUS_process_am_3_9,STATUS_pilon_3_10,STATUS_pilon_report_3_10,STATUS_fastqc2_5_11,STATUS_fastqc2_report_5_11,STATUS_abricate_6_12,STATUS_process_abricate_6_12,STATUS_chewbbaca_7_13,STATUS_spades_8_14,STATUS_skesa_9_15,STATUS_assembly_mapping_8_16,STATUS_process_am_8_16,STATUS_pilon_8_17,STATUS_pilon_report_8_17,STATUS_assembly_mapping_9_18,STATUS_process_am_9_18,STATUS_pilon_9_19,STATUS_pilon_report_9_19,STATUS_abricate_9_20,STATUS_process_abricate_9_20,STATUS_mlst_10_21,STATUS_prokka_11_22,STATUS_chewbbaca_12_23)
output:
file '*.status' into master_status
file '*.warning' into master_warning
file '*.fail' into master_fail
file '*.log'
"""
echo $sample_id, $task_name, \$(cat $status) > ${sample_id}_${task_name}.status
echo $sample_id, $task_name, \$(cat $warning) > ${sample_id}_${task_name}.warning
echo $sample_id, $task_name, \$(cat $fail) > ${sample_id}_${task_name}.fail
echo "\$(cat .command.log)" > ${sample_id}_${task_name}.log
"""
}
process compile_status_buffer {
input:
file status from master_status.buffer( size: 5000, remainder: true)
file warning from master_warning.buffer( size: 5000, remainder: true)
file fail from master_fail.buffer( size: 5000, remainder: true)
output:
file 'master_status_*.csv' into compile_status_buffer
file 'master_warning_*.csv' into compile_warning_buffer
file 'master_fail_*.csv' into compile_fail_buffer
"""
cat $status >> master_status_${task.index}.csv
cat $warning >> master_warning_${task.index}.csv
cat $fail >> master_fail_${task.index}.csv
"""
}
process compile_status {
publishDir 'reports/status'
input:
file status from compile_status_buffer.collect()
file warning from compile_warning_buffer.collect()
file fail from compile_fail_buffer.collect()
output:
file "*.csv"
"""
cat $status >> master_status.csv
cat $warning >> master_warning.csv
cat $fail >> master_fail.csv
"""
}
/** Reports
Compiles the reports from every process
*/
process report {
tag { sample_id }
input:
set sample_id,
task_name,
pid,
report_json,
version_json,
trace from REPORT_reads_download_1_1.mix(REPORT_integrity_coverage_2_2,REPORT_spades_2_3,REPORT_skesa_3_4,REPORT_integrity_coverage_4_5,REPORT_megahit_4_6,REPORT_megahit_fastg_4_6,REPORT_integrity_coverage_5_7,REPORT_fastqc_5_8,REPORT_fastqc_report_5_8,REPORT_trimmomatic_5_8,REPORT_assembly_mapping_3_9,REPORT_process_am_3_9,REPORT_pilon_3_10,REPORT_pilon_report_3_10,REPORT_fastqc2_5_11,REPORT_fastqc2_report_5_11,REPORT_abricate_6_12,REPORT_process_abricate_6_12,REPORT_chewbbaca_7_13,REPORT_spades_8_14,REPORT_skesa_9_15,REPORT_assembly_mapping_8_16,REPORT_process_am_8_16,REPORT_pilon_8_17,REPORT_pilon_report_8_17,REPORT_assembly_mapping_9_18,REPORT_process_am_9_18,REPORT_pilon_9_19,REPORT_pilon_report_9_19,REPORT_abricate_9_20,REPORT_process_abricate_9_20,REPORT_mlst_10_21,REPORT_prokka_11_22,REPORT_chewbbaca_12_23)
output:
file "*" optional true into master_report
"""
prepare_reports.py $report_json $version_json $trace $sample_id $task_name 1 $pid $workflow.scriptId $workflow.runName
"""
}
process compile_reports {
publishDir "pipeline_report/", mode: "copy"
if ( params.reportHTTP != null ){
beforeScript "PATH=${workflow.projectDir}/bin:\$PATH; export PATH;"
afterScript "metadata_POST.sh $params.projectId $params.pipelineId 0 $params.sampleName $params.reportHTTP $params.currentUserName $params.currentUserId 0 \"$params.platformSpecies\""
}
input:
file report from master_report.collect()
file forks from Channel.fromPath("${workflow.projectDir}/.forkTree.json")
file dag from Channel.fromPath("${workflow.projectDir}/.treeDag.json")
file js from Channel.fromPath("${workflow.projectDir}/resources/main.js.zip")
output:
file "pipeline_report.json"
file "pipeline_report.html"
file "src/main.js"
script:
template "compile_reports.py"
}
workflow.onComplete {
// Display complete message
log.info "Completed at: " + workflow.complete
log.info "Duration : " + workflow.duration
log.info "Success : " + workflow.success
log.info "Exit status : " + workflow.exitStatus
}
workflow.onError {
// Display error message
log.info "Workflow execution stopped with the following message:"
log.info " " + workflow.errorMessage
}
Cheap and disposable pipeline
flowcraft build -r denim -o my_denim_pipeline
Recipes as shareable workflows created on the fly
European Food Safety Authority funded project for genomic surveillance of food-borne pathogens
Specification of pipeline components by the user
42h on 200 CPUs
151 samples
1812 assemblies
43s/assembly
Trimming
Sampled assemblies
Dots above red line
Same sample interpreted with different profile
Potentially misinterpreted as outbreak
Enables the creation of automatic pipelines on the fly
Diogo N Silva
Tiago F Jesus
Catarina I Mendes
Bruno
Ribeiro-Gonçalves
Prof. Mário Ramirez
Prof. João André Carriço
Lifebit Bioinformatician
Lifebit Bioinformatician
Lifebit Bioinformatician
PhD at IMM
PI at IMM
Researcher at IMM
and happy pipeline building
conda install flowcraft
pip install flowcraft
BacGenTrack project [FCT / Scientific and Technological Research Council of Turkey, TUBITAK/0004/2014]
Funding and ackowledgements
ONEIDA project (LISBOA-01-0145-FEDER-016417) co-founded by 'Fundos Internacionais Europeus Estruturais e de Investimento' and the national funds from FCT - Fundação para a Ciência e Tecnologia
Lifebit Biotech Ltd.