Diogo Nuno Proença Rico Silva
Studies evolutionary patterns above the species level
Provides insights on how taxonomic groups evolved
Data usually consists of DNA or protein sequences from unrelated taxa
Studies evolutionary patterns below the species level
Unravels the evolutionary history of populations within a species
Data usually consists of DNA sequences from closely related taxa
In this project:
In this project:
Phylogenomics of the Basidiomycota focusing on the rust fungi
Population genomics of H. vastatrix,
the causal agent of Coffee Leaft Rust
Software development:
TriFusion: Streamlining phylogenomic data gathering, processing and visualization
Stem rust
Puccinia graminis
Poplar rust
Melampsora spp.
Wheat leaf rust
Puccinia triticina
Soybean rust
Phakopsora spp.
Broad bean rust
Uromyces viciea-fabae
Coffee leaf rust
Hemileia vastatrix
Obligate biotrophy
Expansion of gene families resulting in high number of genes
Absence or loss of genes involve in nutrient updake
Larger repertoire of small secreted proteins
Proliferation of transposons (mobile genetic elements)
"Big" genomes
Forgot how to survive outside the host
Excel at nullifying the host's defenses
Really, "big" genomes
What?
Published in PLOS One
48 complete genomes
+
21 EST data sets
Ortholog assembly
(OrthoMCL + HaMSTr)
Pre-alignment QC
Alignment of putative orthologs
Post-alignment QC
Removal of saturated alignments
Removal of outlier alignments
Missing data filtering and file conversion
Data set creation
(3093 orthologs)
Ortholog assembly workflow
~ 50 scripts
~ 5000 lines of code
9 months
Maximum Likelihood reconstruction using RAxML
3093 genes - 67 taxa
Detection of positive selection at the root branch of the rust fungi
Episodic positive selection using branch-site model (PAML)
531 genes (nucleotide) - 37 taxa
104 genes (19.6%) with signatures of positive selection
289 amino acid sites across 72 genes
Profiling selected amino acid sites
Unique: single variant exclusive to rust fungi
Diversifying: multiple variants exclusive to rust fungi
Function annotation according to 21 KOG classes
Enrichment of several transport and metabolism classes
Reflection of extensive genomic changes of nutrient transport and uptake for obligate biotrophs
Enrichment of secondary metabolite biosynthesis
Possible triggers of plant defenses
Methods and software for gathering and processing phylogenomic data are severely lacking!
Episodic positive selection using branch-site model (PAML)
531 genes (nucleotide) - 37 taxa
71 sites (24%) across 45 genes with signatures of positive selection on conserved amino acid sites.
Episodic positive selection using branch-site model (PAML)
531 genes (nucleotide) - 37 taxa
Transition from AGY to TCN requires at least 2 non-synonymous mutations to maintain the same amino acid.
Substantial shift in codon usage between rusts and non-rusts for these amino acids.
Why?
Software artifacts
Role of positive selection on codon usage
Under review in Systematics Biology
Orthology
Process
Statistcs
Search for orthologs across multiple genomes
Filters orthologs according to gene copy number and number of taxa
Graphical and interactive exploration of ortholog cluters
Export orthologs as protein and/or nucleotide sequences
Complete proteomes
Sequence alignments
Concatenate/converts
thousands of alignments
Collapse, filter, code gaps, creates consensus
Supports custom partitions schemes and substitution models
Supports +10 popular alignment formats
Fast and efficient
Summary statistics for thousands of files in seconds
Dozens of graphical options to explore alignment data
Automatic detection of outlier genes/taxa
Plot fast switching for quick exploration
Generation of publication ready figures
Concatenation
2-52k alignments
40-376 taxa
17-567Mb
Accepted with changes in Molecular Plant Pathology
Hemileia vastatrix:
... is a biotrophic pathogen
... causes Coffee Leaf Rust worldwide
... with yield losses up to 35%
... more than 50 races pathotypes/races
... responsible for multiple outbreaks across Latin America
What do we know?
Diversity
Clonality
No differentiation
Structure
... by geography
... by pathotype
... by host
39 isolates (30 unique + 9 replicates) from CIFC collection
Pathotypes
20
Sampling age range
1954-2013
Hosts
9 diploids (C. canephora and others)
21 tetraploids (C. arabica, HDT, inter-specific hybrids
Maximum Likelihood reconstruction using RAxML (~20k SNPs)
1. Evidence of population structure according to host
2. Near absent structure among C3 isolates
3. Ladder-like pattern at the base of the C3 group
Population structure and introgression
Diploid hosts
Tetraploid hosts
Almost complete population differentiation
3 isolates with allele sharing signal
Supports the scenario of hybridization and introgression
Emergence of the C3 group
Could the C3 group be the result from a recent introduction from diploid coffee hosts?
Divergence bewteen C2 and C3 groups
Diversification of the C3 group
Recombination within the C3 group
Sexual
Clonal
Association index: Measures linkage disequilibrium between SNPs and compares with expected distribution under equilibrium
Significant evidence of recombination occurring within the C3 group
Courses given (1):
- Python 101: From biologists to biologists
Talks given (6):
- LEAF seminar
- Encontros Scientia
- ASIC 2014/2016
- PDP2017
Code contributions (~3500):
- Kivy
- MMSeqs2
- ipyrad
Paper co-authorship (7):
- With Silva, M @ Microbial genomics
- With Pina-Martins, F @ Molecular Ecology Resources
- With Batista, D @ Frontier in Plant Science
- With Talhinhas, P @ Molecular Plant Pathology
- With Silva, SE @ Systematics and Biodiversity
- With Romeiras, MM @ PLOS One
- With Rodrigues, ASB @ PLOS One
Acknowledgements:
CIFC
CoBiG2