Characterizing Race/Ethnicity and Genetic Ancestry for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort
Open Access
- 19 June 2015
- journal article
- Published by Oxford University Press (OUP) in Genetics
- Vol. 200 (4), 1285-1295
- https://doi.org/10.1534/genetics.115.178616
Abstract
Using genome-wide genotypes, we characterized the genetic structure of 103,006 participants in the Kaiser Permanente Northern California multi-ethnic Genetic Epidemiology Research on Adult Health and Aging Cohort and analyzed the relationship to self-reported race/ethnicity. Participants endorsed any of 23 race/ethnicity/nationality categories, which were collapsed into seven major race/ethnicity groups. By self-report the cohort is 80.8% white and 19.2% minority; 93.8% endorsed a single race/ethnicity group, while 6.2% endorsed two or more. Principal component (PC) and admixture analyses were generally consistent with prior studies. Approximately 17% of subjects had genetic ancestry from more than one continent, and 12% were genetically admixed, considering only nonadjacent geographical origins. Self-reported whites were spread on a continuum along the first two PCs, indicating extensive mixing among European nationalities. Self-identified East Asian nationalities correlated with genetic clustering, consistent with extensive endogamy. Individuals of mixed East Asian–European genetic ancestry were easily identified; we also observed a modest amount of European genetic ancestry in individuals self-identified as Filipinos. Self-reported African Americans and Latinos showed extensive European and African genetic ancestry, and Native American genetic ancestry for the latter. Among 3741 genetically identified parent–child pairs, 93% were concordant for self-reported race/ethnicity; among 2018 genetically identified full-sib pairs, 96% were concordant; the lower rate for parent–child pairs was largely due to intermarriage. The parent–child pairs revealed a trend toward increasing exogamy over time; the presence in the cohort of individuals endorsing multiple race/ethnicity categories creates interesting challenges and future opportunities for genetic epidemiologic studies.Keywords
This publication has 39 references indexed in Scilit:
- Design and coverage of high throughput genotyping arrays optimized for individuals of East Asian, African American, and Latino race/ethnicity using imputation and a novel hybrid SNP selection algorithmGenomics, 2011
- Next generation genome-wide association tool: Design and coverage of a high-throughput European-optimized SNP arrayGenomics, 2011
- Robust relationship inference in genome-wide association studiesBioinformatics, 2010
- Reconstructing Indian population historyNature, 2009
- Accounting for ancestry: population substructure and genome-wide association studiesHuman Molecular Genetics, 2008
- Genes mirror geography within EuropeNature, 2008
- Proportionally more deleterious genetic variation in European than in African populationsNature, 2008
- Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controlsNature, 2007
- Measuring European Population Stratification with Microarray Genotype DataAmerican Journal of Human Genetics, 2007
- Estimation of individual admixture: Analytical and study design considerationsGenetic Epidemiology, 2005