Analysis and Application of European Genetic Substructure Using 300 K SNP Information

Open Access

18 January 2008

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Genetics

Vol. 4 (1), e4
https://doi.org/10.1371/journal.pgen.0040004

Abstract

European population genetic substructure was examined in a diverse set of >1,000 individuals of European descent, each genotyped with >300 K SNPs. Both STRUCTURE and principal component analyses (PCA) showed the largest division/principal component (PC) differentiated northern from southern European ancestry. A second PC further separated Italian, Spanish, and Greek individuals from those of Ashkenazi Jewish ancestry as well as distinguishing among northern European populations. In separate analyses of northern European participants other substructure relationships were discerned showing a west to east gradient. Application of this substructure information was critical in examining a real dataset in whole genome association (WGA) analyses for rheumatoid arthritis in European Americans to reduce false positive signals. In addition, two sets of European substructure ancestry informative markers (ESAIMs) were identified that provide substantial substructure information. The results provide further insight into European population genetic substructure and show that this information can be used for improving error rates in association testing of candidate genes and in replication studies of WGA scans. Ancestry differences corresponding to ethnic groups may be important in determining disease risk factors and optimizing treatment. Our study further defines ancestry relationship among different European ethnic groups by examining over 300 thousand variations in DNA, in over 2,000 individuals. This study allowed a clearer ascertainment of differences that could not be discerned in smaller studies using more limited numbers of DNA variations. We show clear differences among European American participants of different self-identified ethnic affiliation. The analyses showed multiple components of variation. The components showing the largest variations generally corresponded to the grandparental country or region of origin within Europe. We also show the importance of applying this information in determining genetic risk factors for complex diseases. Moreover, the results have enabled a better selection of smaller numbers of DNA variations that can be used in future disease studies to identify more homogenous participant groups and minimize false positive and false negative results in assessing genetic risk factors for disease.

Keywords

This publication has 38 references indexed in Scilit:

TRAF1–C5as a Risk Locus for Rheumatoid Arthritis — A Genomewide Study
New England Journal of Medicine, 2007
PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses
American Journal of Human Genetics, 2007
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls
Nature, 2007
A Simple and Improved Correction for Population Stratification in Case-Control Studies
American Journal of Human Genetics, 2007
Measuring European Population Stratification with Microarray Genotype Data
American Journal of Human Genetics, 2007
European Population Substructure: Clustering of Northern and Southern Populations
PLoS Genetics, 2006
Principal components analysis corrects for stratification in genome-wide association studies
Nature Genetics, 2006
Origins and evolution of the Europeans' genome: evidence from multiple microsatellite loci
Proceedings Of The Royal Society B-Biological Sciences, 2006
IRF family proteins and type I interferon induction in dendritic cells
Cell Research, 2006
Population Structure and Eigenanalysis
PLoS Genetics, 2006

Cited by 212 articles