What you should know about
What are they?
Why should they be used?
How can you use them?
UC Santa Cruz Genomics Institute
Chr17, GRch38
Alt loci
Alt loci
To account for natural genetic variation, we consider the more general case in which a reference genome is represented by a graph rather than a set of phased chromosomes; the latter is treated as a special case.
Paten et al. (2014)
Reference genome of malaria-infected mosquitoes
Harding et al.
De novo assembly and genotyping of variants using colored de Bruijn graphs
Population reference graphs for tuberculosis bacteria
Mouzos et al.
Iqbal et al., Nature Genetics 2012
You can use GRCh38, however:
- Few tools use the alternative loci
- Be aware of flanking regions
- Where to find annotated data?
You can create your own using vg:
- Building a graph with 1000 genomes vcf requires 200+ GB of memory and ~1.5 TB of disk space
- Compiling vg can be hard
- No annotated data
Conclusion: Depends on what you want to use it for
Black: Hierarchical partitioning (used today in GRCh38)
Sequential partitioning
Red:
"Flanking" regions of the alternative locus have been merged with the main path, revealing that the two genes start at the same position.
Example case:
Do SNPs associated with a disease overlap more with open chromatin in one cell type vs others?
Genomic HyperBrowser
Assumptions:
Check out: