Phylogenomic and population genomic insights on the evolutionary history of Coffee Leaf Rust within the rust fungi
Diogo Nuno Proença Rico Silva
Doctoral dissertation
Advisors:
Dr . Dora Batista
Prof. Dr. Octávio S. Paulo
a
Fungi responsible for ~30% of emerging diseases in plants
The threat has been heightened by:
- Resource rich agricultural practices
- Globalization
Genomics to the rescue!
Better understand the evolutionary history and potential of pathogen populations
Include eco-evolutionary principles in disease control measures
- Climate change
Fisher et al. 2012
Stem rust
Puccinia graminis
Poplar rust
Melampsora spp.
Wheat leaf rust
Puccinia triticina
Soybean rust
Phakopsora spp.
Faba bean rust
Uromyces viciae-fabae
Coffee leaf rust
Hemileia vastatrix
Top 3
Top 3
Top 10
Studies evolutionary patterns above the species level
Provides insights on how taxonomic groups evolved
Data usually consists of DNA or protein sequences from unrelated taxa
Studies evolutionary patterns below the species level
Unravels the evolutionary history of populations within a species
Data usually consists of DNA sequences or SNPs from closely related taxa
How can it contribute:
How can it contribute:
Macro-evolutionary patterns of genetic diversity that correlate with functional innovations
Mechanisms of adaptation and population divergence in pathogens
Chapter II: Phylogenomics
Chapter III: Population genomics
Chapter IV: Software development
Silva et al. 2015 PLOS One, DOI: 10.1371/journal.pone.0143959
Obligate biotrophy
Expansion of gene families resulting in high number of genes
Absence or loss of genes involve in nutrient uptake
Larger repertoire of small secreted proteins
Proliferation of transposons (mobile genetic elements)
"Big" genomes
Forgot how to survive outside the host (oops)
Excel at nullifying the host's defenses
Really, "big" genomes
What?
48 complete genomes
+
21 EST data sets
Ortholog assembly
(OrthoMCL + HaMSTr)
Pre-alignment QC
Alignment of putative orthologs
Post-alignment QC
Removal of saturated alignments
Removal of outlier alignments
Missing data filtering and file conversion
Data set creation
(3093 orthologs)
Ortholog assembly workflow
~ 50 scripts
~ 5000 lines of code
9 months
Maximum Likelihood reconstruction using RAxML
3093 genes - 67 taxa
Detection of positive selection at the root branch of the rust fungi
Episodic positive selection using branch-site model (PAML)
531 genes (nucleotide) - 37 taxa
104 genes (19.6%) with signatures of positive selection
Profiling selected amino acid sites:
Rusts
Non rusts
Unique
Strict
Relaxed
Diversifying
Strict
Relaxed
Functional annotation according to 21 KOG classes
Enrichment of several transport and metabolism classes
Genomic changes of nutrient transport and uptake
Enrichment of secondary metabolite biosynthesis
Possible triggers of plant defenses
Episodic positive selection using branch-site model (PAML)
531 genes (nucleotide) - 37 taxa
71 sites (24%) across 45 genes with signatures of positive selection on conserved amino acid sites.
Most prevalent class per gene
Episodic positive selection using branch-site model (PAML)
531 genes (nucleotide) - 37 taxa
Transition from AGY to TCN requires at least 2 non-synonymous mutations to maintain the same amino acid.
Substantial shift in codon usage between rusts and non-rusts for these amino acids.
Why?
- Role of positive selection?
- Purely neutral mechanisms?
Silva et al. 2018 Molecular Plant Pathology, DOI: 10.1111/mpp.12657
Hemileia vastatrix:
... Hemicyclic with urediniospores as functional propagules
... causes Coffee Leaf Rust with losses up to 30%
... infects C. arabica (tetraploid) and C. canephora (diploid)
... Since 1861 (Lake Victoria) has spread worldwide
... more than 50 pathotypes/races
Literature state of the art?
Diversity
Clonality
No differentiation
Structure
... by geography
... by pathotype
... by host
38 isolates (29 unique + 9 replicates):
Maximum Likelihood reconstruction using RAxML (~20k SNPs)
1. Evidence of population structure according to host
2. Near absent structure among C3 isolates
3. Ladder-like pattern at the base of the C3 group
Population structure and differentiation
Diploid hosts
Tetraploid hosts
Almost complete population differentiation
Structure
Principal Component analysis
Introgression
Putative introgressed isolates
Supports the scenario of hybridization and introgression
Substantial allele sharing
Excess of heterozygosity
C2 > C3
C3 > C2
Emergence of the C3 group
Could the C3 group be the result from a recent introduction from diploid coffee hosts?
Divergence bewteen C2 and C3 groups
Diversification of the C3 group
Recombination within the C3 group
Sexual
Clonal
Association index: Measures linkage disequilibrium between SNPs and compares with expected distribution under equilibrium
Significant evidence of recombination occurring within the C3 group
Orthology
Process
Statistcs
Search orthologs...
Filter orthologs...
Graphical exploration
Export as protein/ DNA sequences
Complete proteomes
Sequence alignments
Convert/Concatenation
Collapse, filter, code gaps, creates consensus
Partition schemes
substitution models
+10 popular alignment formats
Fast and efficient
Summary statistics
Dozens of graphs
Detect outlier taxa/genes
Fast plot switching
Publication ready figures
Concatenation
Less is better
8 data sets:
5 software:
Makes the home page and a featured project by the Kivy framework
Phylogenomics
Population genomics
Software development
Acknowledgements:
CIFC
CoBiG2
PhD grant: SFRH/BD/86736/2012
Project grant: PTDC/AGR-GPL/119943/2010