A Bayesian Hierarchical Model for Analysis of Single-Nucleotide Polymorphisms Diversity in Multilocus, Multipopulation Samples
- 1 March 2009
- journal article
- Published by Taylor & Francis Ltd in Journal of the American Statistical Association
- Vol. 104 (485), 142-154
- https://doi.org/10.1198/jasa.2009.0010
Abstract
The distribution of genetic variation among populations is conveniently measured by Wright's F ST, which is a scaled variance taking on values in [0,1]. For certain types of genetic markers and for single-nucleotide polymorphisms (SNPs) in particular, it is reasonable to presume that allelic differences at most loci are selectively neutral. For such loci, the distribution of genetic variation among populations is determined by the size of local populations, the pattern and rate of migration among those populations, and the rate of mutation. Because the demographic parameters (population sizes and migration rates) are common across all autosomal loci, locus-specific estimates of F ST will depart from a common distribution only for loci with unusually high or low rates of mutation or for loci that are closely associated with genomic regions having a relationship with fitness. Thus, loci that are statistical outliers showing significantly more among-population differentiation than others may mark genomic regions subject to diversifying selection among the sample populations. Similarly, statistical outliers showing significantly less differentiation among populations than others may mark genomic regions subject to stabilizing selection across the sample populations. We propose several Bayesian hierarchical models to estimate locus-specific effects on F ST, and we apply these models to single nucleotide polymorphism data from the HapMap project. Because loci that are physically associated with one another are likely to show similar patterns of variation, we introduce conditional autoregressive models to incorporate the local correlation among loci for high-resolution genomic data. We estimate the posterior distributions of model parameters using Markov chain Monte Carlo (MCMC) simulations. Model comparison using several criteria, including deviance information criterion (DIC) and pseudomarginal likelihood (LPML), reveals that a model with locus- and population-specific effects is superior to other models for the data used in the analysis. To detect statistical outliers we propose an approach that measures divergence between the posterior distributions of locus-specific effects and the common F ST with the Kullback-Leibler divergence measure. We calibrate this measure by comparing values with those produced from the divergence between a biased and a fair coin. We conduct a simulation study to illustrate the performance of our approach for detecting loci subject to stabilizing/divergent selection, and we apply the proposed models to low- and high-resolution SNP data from the HapMap project. Model comparison using DIC and LPML reveals that conditional autoregressive (CAR) models are superior to alternative models for the high-resolution data. For both low- and high- resolution data, we identify statistical outliers that are associated with known genes.Keywords
This publication has 25 references indexed in Scilit:
- Bayesian Variable Selection for Detecting Adaptive Genomic Differences Among PopulationsGenetics, 2008
- Molecular Signatures of Natural SelectionAnnual Review of Genetics, 2005
- A haplotype map of the human genomeNature, 2005
- Measures of human population structure show heterogeneity among genomic regionsGenome Research, 2005
- Genetic Structure of Human PopulationsScience, 2002
- Estimating F-StatisticsAnnual Review of Genetics, 2002
- Bayesian Measures of Model Complexity and FitJournal of the Royal Statistical Society Series B: Statistical Methodology, 2002
- A Human Genome Diversity Cell Line PanelScience, 2002
- Bayesian analysis of outlier problems using divergence measuresThe Canadian Journal of Statistics / La Revue Canadienne de Statistique, 1995
- THE GENETICAL STRUCTURE OF POPULATIONSAnnals of Eugenics, 1949