An Arabidopsis Example of Association Mapping in Structured Samples

Top Cited Papers

Open Access

1 January 2007

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Genetics

Vol. 3 (1), e4
https://doi.org/10.1371/journal.pgen.0030004

Abstract

A potentially serious disadvantage of association mapping is the fact that marker-trait associations may arise from confounding population structure as well as from linkage to causative polymorphisms. Using genome-wide marker data, we have previously demonstrated that the problem can be severe in a global sample of 95 Arabidopsis thaliana accessions, and that established methods for controlling for population structure are generally insufficient. Here, we use the same sample together with a number of flowering-related phenotypes and data-perturbation simulations to evaluate a wider range of methods for controlling for population structure. We find that, in terms of reducing the false-positive rate while maintaining statistical power, a recently introduced mixed-model approach that takes genome-wide differences in relatedness into account via estimated pairwise kinship coefficients generally performs best. By combining the association results with results from linkage mapping in F2 crosses, we identify one previously known true positive and several promising new associations, but also demonstrate the existence of both false positives and false negatives. Our results illustrate the potential of genome-wide association scans as a tool for dissecting the genetics of natural variation, while at the same time highlighting the pitfalls. The importance of study design is clear; our study is severely under-powered both in terms of sample size and marker density. Our results also provide a striking demonstration of confounding by population structure. While statistical methods can be used to ameliorate this problem, they cannot always be effective and are certainly not a substitute for independent evidence, such as that obtained via crosses or transgenic experiments. Ultimately, association mapping is a powerful tool for identifying a list of candidates that is short enough to permit further genetic study. There is currently tremendous interest in using association mapping to find the genes responsible for natural variation, particularly for human disease. In association mapping, researchers seek to identify regions of the genome where individuals who are phenotypically similar (e.g., they all have the same disease) are also unusually closely related. A potentially serious problem is that spurious correlations may arise if the population is structured so that members of a subgroup tend to be much more closely related. We have previously demonstrated that this problem can be severe in Arabidopsis thaliana, and that established statistical methods for controlling for population structure are insufficient. Here, we evaluate a broader range of methods. We find that a recently introduced mixed-model approach generally performs best. By combining the association results with results from linkage mapping in F2 crosses, we identify one previously known true positive and several promising new associations, but also demonstrate the existence of both false positives and false negatives. Our results illustrate the potential of genome-wide association scans as a tool for dissecting the genetics of natural variation, while at the same time highlighting the pitfalls.

Keywords

This publication has 47 references indexed in Scilit:

Variation in the epigenetic silencing of FLC contributes to natural variation in Arabidopsis vernalization response
Genes & Development, 2006
Principal components analysis corrects for stratification in genome-wide association studies
Nature Genetics, 2006
The PHYTOCHROME C photoreceptor gene mediates natural variation in flowering and growth responses of Arabidopsis thaliana
Nature Genetics, 2006
A unified mixed-model method for association mapping that accounts for multiple levels of relatedness
Nature Genetics, 2005
Genome-Wide Association Mapping in Arabidopsis Identifies Previously Known Flowering Time and Pathogen Resistance Genes
PLoS Genetics, 2005
A haplotype map of the human genome
Nature, 2005
Confounding from Cryptic Relatedness in Case-Control Association Studies
PLoS Genetics, 2005
Diversity of Flowering Responses in Wild Arabidopsis thaliana Strains
PLoS Genetics, 2005
The Pattern of Polymorphism in Arabidopsis thaliana
PLoS Biology, 2005
Analysis of the Molecular Basis of Flowering Time Variation in Arabidopsis Accessions
Plant Physiology, 2003

Cited by 578 articles