Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype
Top Cited Papers
- 2 August 2019
- journal article
- research article
- Published by Springer Science and Business Media LLC in Nature Biotechnology
- Vol. 37 (8), 907-915
- https://doi.org/10.1038/s41587-019-0201-4
Abstract
The human reference genome represents only a small number of individuals, which limits its usefulness for genotyping. We present a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index. We use HISAT2 to represent and search an expanded model of the human reference genome in which over 14.5 million genomic variants in combination with haplotypes are incorporated into the data structure used for searching and alignment. We benchmark HISAT2 using simulated and real datasets to demonstrate that our strategy of representing a population of genomes, together with a fast, memory-efficient search algorithm, provides more detailed and accurate variant analyses than other methods. We apply HISAT2 for HLA typing and DNA fingerprinting; both applications form part of the HISAT-genotype software that enables analysis of haplotype-resolved genes or genomic regions. HISAT-genotype outperforms other computational methods and matches or exceeds the performance of laboratory-based assays.Keywords
Funding Information
- Cancer Prevention and Research Institute of Texas (RR170068, RR170068)
- U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (R01-HG006677, R01-HG006102)
This publication has 38 references indexed in Scilit:
- An integrated map of genetic variation from 1,092 human genomesNature, 2012
- De novo mutations revealed by whole-exome sequencing are strongly associated with autismNature, 2012
- Fast gapped-read alignment with Bowtie 2Nature Methods, 2012
- How to apply de Bruijn graphs to genome assemblyNature Biotechnology, 2011
- The variant call format and VCFtoolsBioinformatics, 2011
- A map of human genome variation from population-scale sequencingNature, 2010
- Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing dataBioinformatics, 2009
- SOAP: short oligonucleotide alignment programBioinformatics, 2008
- BLAT—The BLAST-Like Alignment ToolGenome Research, 2002
- Initial sequencing and analysis of the human genomeNature, 2001