Detecting Selection in Population Trees: The Lewontin and Krakauer Test Extended
Open Access
- 1 September 2010
- journal article
- Published by Oxford University Press (OUP) in Genetics
- Vol. 186 (1), 241-262
- https://doi.org/10.1534/genetics.110.117275
Abstract
Detecting genetic signatures of selection is of great interest for many research issues. Common approaches to separate selective from neutral processes focus on the variance of FST across loci, as does the original Lewontin and Krakauer (LK) test. Modern developments aim to minimize the false positive rate and to increase the power, by accounting for complex demographic structures. Another stimulating goal is to develop straightforward parametric and computationally tractable tests to deal with massive SNP data sets. Here, we propose an extension of the original LK statistic (TLK), named TF–LK, that uses a phylogenetic estimation of the population's kinship ($\batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(\mathrm{\mathcal{F}}\) \end{document}$) matrix, thus accounting for historical branching and heterogeneity of genetic drift. Using forward simulations of single-nucleotide polymorphisms (SNPs) data under neutrality and selection, we confirm the relative robustness of the LK statistic (TLK) to complex demographic history but we show that TF–LK is more powerful in most cases. This new statistic outperforms also a multinomial-Dirichlet-based model [estimation with Markov chain Monte Carlo (MCMC)], when historical branching occurs. Overall, TF–LK detects 15–35% more selected SNPs than TLK for low type I errors (P < 0.001). Also, simulations show that TLK and TF–LK follow a chi-square distribution provided the ancestral allele frequencies are not too extreme, suggesting the possible use of the chi-square distribution for evaluating significance. The empirical distribution of TF–LK can be derived using simulations conditioned on the estimated $\batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(\mathrm{\mathcal{F}}\) \end{document}$ matrix. We apply this new test to pig breeds SNP data and pinpoint outliers using TF–LK, otherwise undetected using the less powerful TLK statistic. This new test represents one solution for compromise between advanced SNP genetic data acquisition and outlier analyses.
Keywords
This publication has 25 references indexed in Scilit:
- The Genome Response to Artificial Selection: A Case Study in Dairy CattlePLOS ONE, 2009
- Detecting loci under selection in a hierarchically structured populationHeredity, 2009
- A Bayesian Hierarchical Model for Analysis of Single-Nucleotide Polymorphisms Diversity in Multilocus, Multipopulation SamplesJournal of the American Statistical Association, 2009
- A whole genome Bayesian scan for adaptive genetic divergence in West African cattleBMC Genomics, 2009
- A Genome-Scan Method to Identify Selected Loci Appropriate for Both Dominant and Codominant Markers: A Bayesian PerspectiveGenetics, 2008
- Bayesian Variable Selection for Detecting Adaptive Genomic Differences Among PopulationsGenetics, 2008
- "Contrasting Patterns of Selection at Pinus pinaster Ait. Drought Stress Candidate Genes as Revealed by Genetic Differentiation Analyses"Molecular Biology and Evolution, 2008
- Genetic diversity within and between European pig breeds using microsatellite markersAnimal Genetics, 2006
- Molecular Signatures of Natural SelectionAnnual Review of Genetics, 2005
- Estimating F-StatisticsAnnual Review of Genetics, 2002