Genome-based prediction of maize hybrid performance across genetic groups, testers, locations, and years
- 11 April 2014
- journal article
- research article
- Published by Springer Science and Business Media LLC in Theoretical and Applied Genetics
- Vol. 127 (6), 1375-1386
- https://doi.org/10.1007/s00122-014-2305-z
Abstract
The calibration data for genomic prediction should represent the full genetic spectrum of a breeding program. Data heterogeneity is minimized by connecting data sources through highly related test units. One of the major challenges of genome-enabled prediction in plant breeding lies in the optimum design of the population employed in model training. With highly interconnected breeding cycles staggered in time the choice of data for model training is not straightforward. We used cross-validation and independent validation to assess the performance of genome-based prediction within and across genetic groups, testers, locations, and years. The study comprised data for 1,073 and 857 doubled haploid lines evaluated as testcrosses in 2 years. Testcrosses were phenotyped for grain dry matter yield and content and genotyped with 56,110 single nucleotide polymorphism markers. Predictive abilities strongly depended on the relatedness of the doubled haploid lines from the estimation set with those on which prediction accuracy was assessed. For scenarios with strong population heterogeneity it was advantageous to perform predictions within a priori defined genetic groups until higher connectivity through related test units was achieved. Differences between group means had a strong effect on predictive abilities obtained with both cross-validation and independent validation. Predictive abilities across subsequent cycles of selection and years were only slightly reduced compared to predictive abilities obtained with cross-validation within the same year. We conclude that the optimum data set for model training in genome-enabled prediction should represent the full genetic and environmental spectrum of the respective breeding program. Data heterogeneity can be reduced by experimental designs that maximize the connectivity between data sources by common or highly related test units.Keywords
This publication has 35 references indexed in Scilit:
- Comparison Between Linear and Non-parametric Regression Models for Genome-Enabled Prediction in WheatG3 Genes|Genomes|Genetics, 2012
- Effectiveness of Genomic Prediction of Maize Hybrid Performance in Different Breeding Populations and EnvironmentsG3 Genes|Genomes|Genetics, 2012
- Genome-based prediction of test cross performance in two subsequent breeding cyclesTheoretical and Applied Genetics, 2012
- synbreed: a framework for the analysis of genomic prediction data using RBioinformatics, 2012
- A Large Maize (Zea mays L.) SNP Genotyping Array: Development and Germplasm Genotyping, and Genetic Mapping to Compare with the B73 Reference GenomePLOS ONE, 2011
- Accuracies of genomic breeding values in American Angus beef cattle using K-means clustering for cross-validationGenetics Selection Evolution, 2011
- Evaluation of genome-wide selection efficiency in maize nested association mapping populationsTheoretical and Applied Genetics, 2011
- Genome-based prediction of testcross values in maizeTheoretical and Applied Genetics, 2011
- The impact of genetic relationship information on genomic breeding values in German Holstein cattleGenetics Selection Evolution, 2010
- Accuracy of genomic breeding values in multi-breed dairy cattle populationsGenetics Selection Evolution, 2009