Native range genetic variation in Arabidopsis thaliana is strongly geographically structured and reflects Pleistocene glacial dynamics

Abstract
Despite Arabidopsis thaliana's pre-eminence as a model organism, major questions remain regarding the geographic structure of its genetic variation due to the geographically incomplete sample set available for previous studies. Many of these questions are addressed here with an analysis of genome-wide variation at 10 loci in 475 individuals from 167 globally distributed populations, including many from critical but previously un-sampled regions. Rooted haplotype networks at three loci suggest that A. thaliana arose in the Caucasus region. Identification of large-scale metapopulations indicates clear east–west genetic structure, both within proposed Pleistocene refugia and post-Pleistocene colonized regions. The refugia themselves are genetically differentiated from one another and display elevated levels of within-population genetic diversity relative to recolonized areas. The timing of an inferred demographic expansion coincides with the Eemian interglacial (approximately 120 000 years ago). Taken together, these patterns are strongly suggestive of Pleistocene range dynamics. Spatial autocorrelation analyses indicate that isolation by distance is pervasive at all hierarchical levels, but that it is reduced in portions of Europe.