Analysis and Application of European Genetic Substructure Using 300 K SNP Information

Abstract
European population genetic substructure was examined in a diverse set of >1,000 individuals of European descent, each genotyped with >300 K SNPs. Both STRUCTURE and principal component analyses (PCA) showed the largest division/principal component (PC) differentiated northern from southern European ancestry. A second PC further separated Italian, Spanish, and Greek individuals from those of Ashkenazi Jewish ancestry as well as distinguishing among northern European populations. In separate analyses of northern European participants other substructure relationships were discerned showing a west to east gradient. Application of this substructure information was critical in examining a real dataset in whole genome association (WGA) analyses for rheumatoid arthritis in European Americans to reduce false positive signals. In addition, two sets of European substructure ancestry informative markers (ESAIMs) were identified that provide substantial substructure information. The results provide further insight into European population genetic substructure and show that this information can be used for improving error rates in association testing of candidate genes and in replication studies of WGA scans. Ancestry differences corresponding to ethnic groups may be important in determining disease risk factors and optimizing treatment. Our study further defines ancestry relationship among different European ethnic groups by examining over 300 thousand variations in DNA, in over 2,000 individuals. This study allowed a clearer ascertainment of differences that could not be discerned in smaller studies using more limited numbers of DNA variations. We show clear differences among European American participants of different self-identified ethnic affiliation. The analyses showed multiple components of variation. The components showing the largest variations generally corresponded to the grandparental country or region of origin within Europe. We also show the importance of applying this information in determining genetic risk factors for complex diseases. Moreover, the results have enabled a better selection of smaller numbers of DNA variations that can be used in future disease studies to identify more homogenous participant groups and minimize false positive and false negative results in assessing genetic risk factors for disease.