Robust Demographic Inference from Genomic and SNP Data

Top Cited Papers
Open Access
Abstract
We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with , the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets. We present a new likelihood-based method to infer the past demography of a set of populations from large genomic datasets. Our method can be applied to arbitrarily complex models as the likelihood is estimated by coalescent simulations. Under simple scenarios, our method behaves similarly to a widely used diffusion-based method while showing better convergence properties. In addition, our approach can be applied to very complex models including as many as a dozen populations, and still retrieve parameters very accurately in a reasonable time. We apply our approach to estimate the past demography of four human populations for which non-coding whole genome diversity is available, estimating the degree of European admixture of a southwest African American population and that of a Kenyan population with an unsampled East African population. We also show the versatility of our framework by inferring the demographic history of African populations from SNP chip data with known ascertainment bias, and find a very old divergence time (>110 Ky) between Yorubas from Western Africa and Sans from Southern Africa.