Genome-based prediction of maize hybrid performance across genetic groups, testers, locations, and years

11 April 2014

journal article
research article
Published by Springer Science and Business Media LLC in Theoretical and Applied Genetics

Vol. 127 (6), 1375-1386
https://doi.org/10.1007/s00122-014-2305-z

Abstract

The calibration data for genomic prediction should represent the full genetic spectrum of a breeding program. Data heterogeneity is minimized by connecting data sources through highly related test units. One of the major challenges of genome-enabled prediction in plant breeding lies in the optimum design of the population employed in model training. With highly interconnected breeding cycles staggered in time the choice of data for model training is not straightforward. We used cross-validation and independent validation to assess the performance of genome-based prediction within and across genetic groups, testers, locations, and years. The study comprised data for 1,073 and 857 doubled haploid lines evaluated as testcrosses in 2 years. Testcrosses were phenotyped for grain dry matter yield and content and genotyped with 56,110 single nucleotide polymorphism markers. Predictive abilities strongly depended on the relatedness of the doubled haploid lines from the estimation set with those on which prediction accuracy was assessed. For scenarios with strong population heterogeneity it was advantageous to perform predictions within a priori defined genetic groups until higher connectivity through related test units was achieved. Differences between group means had a strong effect on predictive abilities obtained with both cross-validation and independent validation. Predictive abilities across subsequent cycles of selection and years were only slightly reduced compared to predictive abilities obtained with cross-validation within the same year. We conclude that the optimum data set for model training in genome-enabled prediction should represent the full genetic and environmental spectrum of the respective breeding program. Data heterogeneity can be reduced by experimental designs that maximize the connectivity between data sources by common or highly related test units.

Keywords

This publication has 35 references indexed in Scilit:

Comparison Between Linear and Non-parametric Regression Models for Genome-Enabled Prediction in Wheat
G3 Genes|Genomes|Genetics, 2012
Effectiveness of Genomic Prediction of Maize Hybrid Performance in Different Breeding Populations and Environments
G3 Genes|Genomes|Genetics, 2012
Genome-based prediction of test cross performance in two subsequent breeding cycles
Theoretical and Applied Genetics, 2012
synbreed: a framework for the analysis of genomic prediction data using R
Bioinformatics, 2012
A Large Maize (Zea mays L.) SNP Genotyping Array: Development and Germplasm Genotyping, and Genetic Mapping to Compare with the B73 Reference Genome
PLOS ONE, 2011
Accuracies of genomic breeding values in American Angus beef cattle using K-means clustering for cross-validation
Genetics Selection Evolution, 2011
Evaluation of genome-wide selection efficiency in maize nested association mapping populations
Theoretical and Applied Genetics, 2011
Genome-based prediction of testcross values in maize
Theoretical and Applied Genetics, 2011
The impact of genetic relationship information on genomic breeding values in German Holstein cattle
Genetics Selection Evolution, 2010
Accuracy of genomic breeding values in multi-breed dairy cattle populations
Genetics Selection Evolution, 2009

Cited by 96 articles