Universidade de Lisboa

Faculdade de Ciências

Seminário doutoral 2017

Phylogenomic and population genomic insights on the evolutionary history of rust fungi and Coffee Leaf Rust

Diogo Nuno Proença Rico Silva

Orientação:

Dr. Dora Batista

Prof. Dr. Octávio S. Paulo

Phylogenomics

Population genomics

Studies evolutionary patterns above the species level
Provides insights on how taxonomic groups evolved
Data usually consists of DNA or protein sequences from unrelated taxa

Studies evolutionary patterns below the species level
Unravels the evolutionary history of populations within a species
Data usually consists of DNA sequences from closely related taxa

In this project:

Phylogenomics of the Basidiomycota focusing on the rust fungi

Population genomics of H. vastatrix,

the causal agent of Coffee Leaft Rust

Software development:

TriFusion: Streamlining phylogenomic data gathering, processing and visualization

The rust fungi

Stem rust

Puccinia graminis

Poplar rust

Melampsora spp.

Wheat leaf rust

Puccinia triticina

Soybean rust

Phakopsora spp.

Broad bean rust

Uromyces viciea-fabae

Coffee leaf rust

Hemileia vastatrix

The rust fungi

Life style:

Genomic characteristics:

Obligate biotrophy

Expansion of gene families resulting in high number of genes

Absence or loss of genes involve in nutrient updake

Larger repertoire of small secreted proteins

Proliferation of transposons (mobile genetic elements)

"Big" genomes

Forgot how to survive outside the host

Excel at nullifying the host's defenses

Really, "big" genomes

What?

Part I

Phylogenomics

What was the role of adaptive genetic variation on genes shared by rusts and other Basidiomicota?

On the origin of the rusts:

Published in PLOS One

Phylogenomics | obectives

Detect the largest number of single-copy orthologs shared among a data set of 67 Basidiomycota and Ascomycota genomes and EST

Screen for episodic selection acting on specific amino acids and determine the magnitude of the selection signal on the origin of the rust fungi

Annotate the candidate genes and check for functional classes enriched for positively selected genes

Phylogenomics | methods

48 complete genomes

21 EST data sets

Ortholog assembly

(OrthoMCL + HaMSTr)

Pre-alignment QC

Alignment of putative orthologs

Post-alignment QC

Removal of saturated alignments

Removal of outlier alignments

Missing data filtering and file conversion

Data set creation

(3093 orthologs)

Ortholog assembly workflow

~ 50 scripts

~ 5000 lines of code

9 months

Phylogenomics | results

Maximum Likelihood reconstruction using RAxML

3093 genes - 67 taxa

Detection of positive selection at the root branch of the rust fungi

Phylogenomics | results

Episodic positive selection using branch-site model (PAML)

531 genes (nucleotide) - 37 taxa

104 genes (19.6%) with signatures of positive selection

289 amino acid sites across 72 genes

Profiling selected amino acid sites

Unique: single variant exclusive to rust fungi

Diversifying: multiple variants exclusive to rust fungi

Phylogenomics | results

Function annotation according to 21 KOG classes

Enrichment of several transport and metabolism classes

Reflection of extensive genomic changes of nutrient transport and uptake for obligate biotrophs

Enrichment of secondary metabolite biosynthesis

Possible triggers of plant defenses

Phylogenomics | conclusion

Phylogenomic analyses hold significant promise in identifying important evolutionary transitions using positive selection detection

Current methods allow researchers to pinpoint the action of natural selection on specific periods of the evolutionary history and on specific regions of genes

The pervasive signal of positive selection suggests that the transition of the rust fungi to obligate biotrophy required significant adaptive changes in conserved genes

Methods and software for gathering and processing phylogenomic data are severely lacking!

Part II

TriFusion

Streamlining phylogenomic data gathering, processing and visualization

Under review in Systematics Biology

TriFusion | description

Orthology

Process

Statistcs

Search for orthologs across multiple genomes

Filters orthologs according to gene copy number and number of taxa

Graphical and interactive exploration of ortholog cluters

Export orthologs as protein and/or nucleotide sequences

Complete proteomes

Sequence alignments

Concatenate/converts

thousands of alignments

Collapse, filter, code gaps, creates consensus

Supports custom partitions schemes and substitution models

Supports +10 popular alignment formats

Fast and efficient

Summary statistics for thousands of files in seconds

Dozens of graphical options to explore alignment data

Automatic detection of outlier genes/taxa

Plot fast switching for quick exploration

Generation of publication ready figures

TriFusion | benchmarks

Concatenation

2-52k alignments

40-376 taxa

17-567Mb

TriFusion | conclusions

TriFusion is an easy to install/use and feature rich applications that allows researchers with no bioninformatics experience to gather, process and visualize large phylogenomic data

Provides a comprehensive suite of complex and computationally intensive operations with unparalleled performance

Has the potential to open up the execution of phylogenomic studies to a much wider community by removing the requirements for bioinformatics/programming expertise

Part III

Population genomics

Using RADseq sequencing to investigate the population genomics of Hemileia vastatrix

Accepted with changes in Molecular Plant Pathology

Pop genomics | introduction

Hemileia vastatrix:

... is a biotrophic pathogen

... causes Coffee Leaf Rust worldwide

... with yield losses up to 35%

... more than 50 races pathotypes/races

... responsible for multiple outbreaks across Latin America

What do we know?

Diversity

Low
Moderate
High

Clonality

Mostly evidence of clonity
Recombination in some regions
Existence of cryptosexuality

No differentiation

Structure

... by geography

... by pathotype

... by host

Pop genomics | objectives

Produce thousands of high quality SNPs for H. vastatrix using RAD-sequencing and technical replicates

Investigate the genetic structure of H. vastatrix, with focus on how it impacts the evolutionary potential

Test the clonality (or not) of H. vastatrix

Pop genomics | sampling

39 isolates (30 unique + 9 replicates) from CIFC collection

Pathotypes

Sampling age range

1954-2013

Hosts

9 diploids (C. canephora and others)

21 tetraploids (C. arabica, HDT, inter-specific hybrids

Pop genomics | results

Maximum Likelihood reconstruction using RAxML (~20k SNPs)

1. Evidence of population structure according to host

2. Near absent structure among C3 isolates

3. Ladder-like pattern at the base of the C3 group

Pop genomics | results

Population structure and introgression

Diploid hosts

Tetraploid hosts

Almost complete population differentiation

3 isolates with allele sharing signal

Supports the scenario of hybridization and introgression

Pop genomics | results

Emergence of the C3 group

Could the C3 group be the result from a recent introduction from diploid coffee hosts?

Divergence bewteen C2 and C3 groups

Diversification of the C3 group

Pop genomics | results

Recombination within the C3 group

Sexual

Clonal

Association index: Measures linkage disequilibrium between SNPs and compares with expected distribution under equilibrium

Significant evidence of recombination occurring within the C3 group

Pop genomics | conclusions

Multiple divergent and genetically isolates lineages within H. vastatrix suggest the presence of a cryptic species complex

The presence of allele sharing between isolates lineages warns about the possibility of exchanging virulence factors

Presence of H. vastatrix isolates in tetraploid hosts could be the result of a recent introduction followed by a specialization process

Recombination (through an unknown process) is likely to occur within isolates infecting C. arabica and to be a major source of genetic variation

Other outputs / contributions

Courses given (1):

- Python 101: From biologists to biologists

Talks given (6):

- LEAF seminar

- Encontros Scientia

- ASIC 2014/2016

- PDP2017

Code contributions (~3500):

- Kivy

- MMSeqs2

- ipyrad

Paper co-authorship (7):

- With Silva, M @ Microbial genomics

- With Pina-Martins, F @ Molecular Ecology Resources

- With Batista, D @ Frontier in Plant Science

- With Talhinhas, P @ Molecular Plant Pathology

- With Silva, SE @ Systematics and Biodiversity

- With Romeiras, MM @ PLOS One

- With Rodrigues, ASB @ PLOS One

Thank you for your attention

Acknowledgements:

CIFC

CoBiG2