Robust Demographic Inference from Genomic and SNP Data
Top Cited Papers
Open Access
- 24 October 2013
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Genetics
- Vol. 9 (10), e1003905
- https://doi.org/10.1371/journal.pgen.1003905
Abstract
We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with , the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets. We present a new likelihood-based method to infer the past demography of a set of populations from large genomic datasets. Our method can be applied to arbitrarily complex models as the likelihood is estimated by coalescent simulations. Under simple scenarios, our method behaves similarly to a widely used diffusion-based method while showing better convergence properties. In addition, our approach can be applied to very complex models including as many as a dozen populations, and still retrieve parameters very accurately in a reasonable time. We apply our approach to estimate the past demography of four human populations for which non-coding whole genome diversity is available, estimating the degree of European admixture of a southwest African American population and that of a Kenyan population with an unsampled East African population. We also show the versatility of our framework by inferring the demographic history of African populations from SNP chip data with known ascertainment bias, and find a very old divergence time (>110 Ky) between Yorubas from Western Africa and Sans from Southern Africa.Keywords
This publication has 79 references indexed in Scilit:
- Rate of de novo mutations and the importance of father’s age to disease riskNature, 2012
- The genetic prehistory of southern AfricaNature Communications, 2012
- Bayesian inference of ancient human demography from individual genome sequencesNature Genetics, 2011
- Inference of human population history from individual whole-genome sequencesNature, 2011
- Non-equilibrium allele frequency spectra via spectral methodsTheoretical Population Biology, 2011
- Genotype and SNP calling from next-generation sequencing dataNature Reviews Genetics, 2011
- A map of human genome variation from population-scale sequencingNature, 2010
- Statistical evaluation of alternative models of human evolutionProceedings of the National Academy of Sciences of the United States of America, 2007
- Recent and ongoing selection in the human genomeNature Reviews Genetics, 2007
- Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population geneticsProceedings of the National Academy of Sciences of the United States of America, 2007