Identification of Target Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs

Open Access

19 January 2021

journal article
research article
Published by MDPI AG in Animals

Vol. 11 (1), 241
https://doi.org/10.3390/ani11010241

Abstract

A marker combination capable of classifying a specific chicken population could improve commercial value by increasing consumer confidence with respect to the origin of the population. This would facilitate the protection of native genetic resources in the market of each country. In this study, a total of 283 samples from 20 lines, which consisted of Korean native chickens, commercial native chickens, and commercial broilers with a layer population, were analyzed to determine the optimal marker combination comprising the minimum number of markers, using a 600 k high-density single nucleotide polymorphism (SNP) array. Machine learning algorithms, a genome-wide association study (GWAS), linkage disequilibrium (LD) analysis, and principal component analysis (PCA) were used to distinguish a target (case) group for comparison with control chicken groups. In the processing of marker selection, a total of 47,303 SNPs were used for classifying chicken populations; 96 LD-pruned SNPs (50 SNPs per LD block) served as the best marker combination for target chicken classification. Moreover, 36, 44, and 8 SNPs were selected as the minimum numbers of markers by the AdaBoost (AB), Random Forest (RF), and Decision Tree (DT) machine learning classification models, which had accuracy rates of 99.6%, 98.0%, and 97.9%, respectively. The selected marker combinations increased the genetic distance and fixation index (Fst) values between the case and control groups, and they reduced the number of genetic components required, confirming that efficient classification of the groups was possible by using a small number of marker sets. In a verification study including additional chicken breeds and samples (12 lines and 182 samples), the accuracy did not significantly change, and the target chicken group could be clearly distinguished from the other populations. The GWAS, PCA, and machine learning algorithms used in this study can be applied efficiently, to determine the optimal marker combination with the minimum number of markers that can distinguish the target population among a large number of SNP markers.

Keywords

This publication has 46 references indexed in Scilit:

Development of a high density 600K SNP genotyping array for chicken
BMC Genomics, 2013
Enhancements to the ADMIXTURE algorithm for individual ancestry estimation
BMC Bioinformatics, 2011
The development and characterization of a 60K SNP chip for chicken
BMC Genomics, 2011
Discrimination of Korean Cattle (Hanwoo) with Imported Beef from USA Based on the SNP Markers
Korean Journal for Food Science of Animal Resources, 2010
Development of breed identification markers based on a bovine 50K SNP array
Meat Science, 2010
Development of a 25‐plex SNP assay for traceability in cattle
Animal Genetics, 2009
Genetic traceability of livestock products: A review
Meat Science, 2007
Chicken genome: Current status and future opportunities
Genome Research, 2005
Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution
Nature, 2004
A review on SNP and other types of molecular markers and their use in animal genetics
Genetics Selection Evolution, 2002

Cited by 9 articles