What you should know about
What are they?
Why should they be used?
How can you use them?
(linear) reference genome
Also common in the population
Graph based reference genome containing both sequences
.
UC Santa Cruz Genomics Institute
Chr17, GRCh38
Alt loci
Alt loci
To account for natural genetic variation, we consider the more general case in which a reference genome is represented by a graph rather than a set of phased chromosomes; the latter is treated as a special case.
Paten et al. (2014)
Reference genome of malaria-infected mosquitoes
Harding et al.
- Genomes from over 700 mosquitoes
- Used for variant calling
De novo assembly and genotyping of variants using colored de Bruijn graphs
Population reference graphs for tuberculosis bacteria
Mouzos et al.
- 300 tuberculosis genomes in one graph
- Used for mapping and visualization
Iqbal et al., Nature Genetics 2012
- Variant calling on a human graph-based reference genome
You can use GRCh38, however:
- Few tools use the alternative loci
- Where to find annotated data?
You can create your own using vg:
- Building a graph with 1000 genomes vcf requires 200+ GB of memory and ~1.5 TB of disk space
- Compiling vg can be difficult
- No annotated data
Check out: