Lab meeting
05/02/2018
Phylogenomic and population genomic insights on the evolutionary history of rust fungi and Coffee Leaf Rust
Diogo Nuno Proença Rico Silva
Phylogenomics
Population genomics
Studies evolutionary patterns above the species level
Provides insights on how taxonomic groups evolved
Data usually consists of DNA or protein sequences from unrelated taxa
-
Studies evolutionary patterns below the species level
-
Unravels the evolutionary history of populations within a species
-
Data usually consists of DNA sequences from closely related taxa
In this project:
In this project:
Phylogenomics of the Basidiomycota focusing on the rust fungi
Population genomics of H. vastatrix,
the causal agent of Coffee Leaft Rust
Software development:
TriFusion: Streamlining phylogenomic data gathering, processing and visualization
The rust fungi
Stem rust
Puccinia graminis
Poplar rust
Melampsora spp.
Wheat leaf rust
Puccinia triticina
Soybean rust
Phakopsora spp.
Broad bean rust
Uromyces viciea-fabae
Coffee leaf rust
Hemileia vastatrix
The rust fungi
Life style:
Genomic characteristics:
Obligate biotrophy
Expansion of gene families resulting in high number of genes
Absence or loss of genes involve in nutrient updake
Larger repertoire of small secreted proteins
Proliferation of transposons (mobile genetic elements)
"Big" genomes
Forgot how to survive outside the host
Excel at nullifying the host's defenses
Really, "big" genomes
What?
Part I
Phylogenomics
What was the role of adaptive genetic variation on genes shared by rusts and other Basidiomicota?
On the origin of the rusts:
Published in PLOS One
Phylogenomics | obectives
- Detect the largest number of single-copy orthologs shared among a data set of 67 Basidiomycota and Ascomycota genomes and EST
- Screen for episodic selection acting on specific amino acids and determine the magnitude of the selection signal on the origin of the rust fungi
- Annotate the candidate genes and check for functional classes enriched for positively selected genes
Phylogenomics | methods
48 complete genomes
+
21 EST data sets
Ortholog assembly
(OrthoMCL + HaMSTr)
Pre-alignment QC
Alignment of putative orthologs
Post-alignment QC
Removal of saturated alignments
Removal of outlier alignments
Missing data filtering and file conversion
Data set creation
(3093 orthologs)
Ortholog assembly workflow
~ 50 scripts
~ 5000 lines of code
9 months
Phylogenomics | results
Maximum Likelihood reconstruction using RAxML
3093 genes - 67 taxa
Detection of positive selection at the root branch of the rust fungi
Phylogenomics | results
Episodic positive selection using branch-site model (PAML)
531 genes (nucleotide) - 37 taxa
104 genes (19.6%) with signatures of positive selection
289 amino acid sites across 72 genes
Profiling selected amino acid sites
Unique: single variant exclusive to rust fungi
Diversifying: multiple variants exclusive to rust fungi
Phylogenomics | results
Function annotation according to 21 KOG classes
Enrichment of several transport and metabolism classes
Reflection of extensive genomic changes of nutrient transport and uptake for obligate biotrophs
Enrichment of secondary metabolite biosynthesis
Possible triggers of plant defenses
Phylogenomics | conclusion
- Phylogenomic analyses hold significant promise in identifying important evolutionary transitions using positive selection detection
- Current methods allow researchers to pinpoint the action of natural selection on specific periods of the evolutionary history and on specific regions of genes
- The pervasive signal of positive selection suggests that the transition of the rust fungi to obligate biotrophy required significant adaptive changes in conserved genes
Methods and software for gathering and processing phylogenomic data are severely lacking!
Phylogenomics | Bonus level
Episodic positive selection using branch-site model (PAML)
531 genes (nucleotide) - 37 taxa
71 sites (24%) across 45 genes with signatures of positive selection on conserved amino acid sites.
Phylogenomics | Bonus level
Episodic positive selection using branch-site model (PAML)
531 genes (nucleotide) - 37 taxa
Transition from AGY to TCN requires at least 2 non-synonymous mutations to maintain the same amino acid.
Substantial shift in codon usage between rusts and non-rusts for these amino acids.
Why?
Software artifacts
Role of positive selection on codon usage
Part II
TriFusion
Streamlining phylogenomic data gathering, processing and visualization
Under review in Systematics Biology
TriFusion | description
Orthology
Process
Statistcs
Search for orthologs across multiple genomes
Filters orthologs according to gene copy number and number of taxa
Graphical and interactive exploration of ortholog cluters
Export orthologs as protein and/or nucleotide sequences
Complete proteomes
Sequence alignments
Concatenate/converts
thousands of alignments
Collapse, filter, code gaps, creates consensus
Supports custom partitions schemes and substitution models
Supports +10 popular alignment formats
Fast and efficient
Summary statistics for thousands of files in seconds
Dozens of graphical options to explore alignment data
Automatic detection of outlier genes/taxa
Plot fast switching for quick exploration
Generation of publication ready figures
TriFusion | benchmarks
Concatenation
2-52k alignments
40-376 taxa
17-567Mb
TriFusion | integrations
- Continuous integration with Travis CI with a suite of more than 220 tests using unittest.
- Extensive API and code documentation automatically built using sphynx and readthedocs.
- Step by step animated tutorials for all functionalities
TriFusion | conclusions
- TriFusion is an easy to install/use and feature rich applications that allows researchers with no bioninformatics experience to gather, process and visualize large phylogenomic data
- Provides a comprehensive suite of complex and computationally intensive operations with unparalleled performance
- Has the potential to open up the execution of phylogenomic studies to a much wider community by removing the requirements for bioinformatics/programming expertise
Part III
Population genomics
Using RADseq sequencing to investigate the population genomics of Hemileia vastatrix
Accepted with changes in Molecular Plant Pathology
Pop genomics | introduction
Hemileia vastatrix:
... is a biotrophic pathogen
... causes Coffee Leaf Rust worldwide
... with yield losses up to 35%
... more than 50 races pathotypes/races
... responsible for multiple outbreaks across Latin America
What do we know?
Diversity
- Low
- Moderate
- High
Clonality
- Mostly evidence of clonity
- Recombination in some regions
- Existence of cryptosexuality
No differentiation
Structure
... by geography
... by pathotype
... by host
Pop genomics | objectives
- Produce thousands of high quality SNPs for H. vastatrix using RAD-sequencing and technical replicates
- Investigate the genetic structure of H. vastatrix, with focus on how it impacts the evolutionary potential
- Test the clonality (or not) of H. vastatrix
Pop genomics | sampling
39 isolates (30 unique + 9 replicates) from CIFC collection
Pathotypes
20
Sampling age range
1954-2013
Hosts
9 diploids (C. canephora and others)
21 tetraploids (C. arabica, HDT, inter-specific hybrids
Pop genomics | results
Maximum Likelihood reconstruction using RAxML (~20k SNPs)
1. Evidence of population structure according to host
2. Near absent structure among C3 isolates
3. Ladder-like pattern at the base of the C3 group
Pop genomics | results
Population structure and introgression
Diploid hosts
Tetraploid hosts
Almost complete population differentiation
3 isolates with allele sharing signal
Supports the scenario of hybridization and introgression
Pop genomics | results
Emergence of the C3 group
Could the C3 group be the result from a recent introduction from diploid coffee hosts?
Divergence bewteen C2 and C3 groups
Diversification of the C3 group
Pop genomics | results
Recombination within the C3 group
Sexual
Clonal
Association index: Measures linkage disequilibrium between SNPs and compares with expected distribution under equilibrium
Significant evidence of recombination occurring within the C3 group
Pop genomics | conclusions
- Multiple divergent and genetically isolates lineages within H. vastatrix suggest the presence of a cryptic species complex
- The presence of allele sharing between isolates lineages warns about the possibility of exchanging virulence factors
- Presence of H. vastatrix isolates in tetraploid hosts could be the result of a recent introduction followed by a specialization process
- Recombination (through an unknown process) is likely to occur within isolates infecting C. arabica and to be a major source of genetic variation
Other outputs / contributions
Courses given (1):
- Python 101: From biologists to biologists
Talks given (6):
- LEAF seminar
- Encontros Scientia
- ASIC 2014/2016
- PDP2017
Code contributions (~3500):
- Kivy
- MMSeqs2
- ipyrad
Paper co-authorship (7):
- With Silva, M @ Microbial genomics
- With Pina-Martins, F @ Molecular Ecology Resources
- With Batista, D @ Frontier in Plant Science
- With Talhinhas, P @ Molecular Plant Pathology
- With Silva, SE @ Systematics and Biodiversity
- With Romeiras, MM @ PLOS One
- With Rodrigues, ASB @ PLOS One
Thank you for your attention
Acknowledgements:
CIFC
CoBiG2
lab meeting
By Diogo Silva
lab meeting
- 672