A sorghum practical haplotype graph facilitates genome‐wide imputation and cost‐effective genomic prediction
Open Access
- 25 March 2020
- journal article
- research article
- Published by Wiley in The Plant Genome
- Vol. 13 (1), e20009
- https://doi.org/10.1002/tpg2.20009
Abstract
Successful management and utilization of increasingly large genomic datasets is essential for breeding programs to accelerate cultivar development. To help with this, we developed a Sorghum bicolor Practical Haplotype Graph (PHG) pangenome database that stores haplotypes and variant information. We developed two PHGs in sorghum that were used to identify genome‐wide variants for 24 founders of the Chibas sorghum breeding program from 0.01x sequence coverage. The PHG called single nucleotide polymorphisms (SNPs) with 5.9% error at 0.01x coverage—only 3% higher than PHG error when calling SNPs from 8x coverage sequence. Additionally, 207 progenies from the Chibas genomic selection (GS) training population were sequenced and processed through the PHG. Missing genotypes were imputed from PHG parental haplotypes and used for genomic prediction. Mean prediction accuracies with PHG SNP calls range from .57–.73 and are similar to prediction accuracies obtained with genotyping‐by‐sequencing or targeted amplicon sequencing (rhAmpSeq) markers. This study demonstrates the use of a sorghum PHG to impute SNPs from low‐coverage sequence data and shows that the PHG can unify genotype calls across multiple sequencing platforms. By reducing input sequence requirements, the PHG can decrease the cost of genotyping, make GS more feasible, and facilitate larger breeding populations. Our results demonstrate that the PHG is a useful research and breeding tool that maintains variant information from a diverse group of taxa, stores sequence data in a condensed but readily accessible format, unifies genotypes across genotyping platforms, and provides a cost‐effective option for genomic selection.Keywords
Funding Information
- Advanced Research Projects Agency - Energy (DE‐AR0000598)
- Agricultural Research Service
- United States Agency for International Development (AID‐OAA‐ LA‐16‐00003)
- Bill and Melinda Gates Foundation
This publication has 40 references indexed in Scilit:
- Population genomic and genome-wide association studies of agroclimatic traits in sorghumProceedings of the National Academy of Sciences of the United States of America, 2012
- A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity SpeciesPLOS ONE, 2011
- Genome sequencing and analysis of the model grass Brachypodium distachyonNature, 2010
- The B73 Maize Genome: Complexity, Diversity, and DynamicsScience, 2009
- The Sequence Alignment/Map format and SAMtoolsBioinformatics, 2009
- Fast and accurate short read alignment with Burrows–Wheeler transformBioinformatics, 2009
- A consensus genetic map of sorghum that integrates multiple component maps and high-throughput Diversity Array Technology (DArT) markersBMC Plant Biology, 2009
- The TIGR Rice Genome Annotation Resource: improvements and new featuresNucleic Acids Research, 2006
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- A tutorial on hidden Markov models and selected applications in speech recognitionProceedings of the IEEE, 1989