Hierarchical boosting: a machine-learning framework to detect and classify hard selective sweeps in human populations

Open Access

26 August 2015

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 31 (24), 3946-3952
https://doi.org/10.1093/bioinformatics/btv493

Abstract

Motivation: Detecting positive selection in genomic regions is a recurrent topic in natural population genetic studies. However, there is little consistency among the regions detected in several genome-wide scans using different tests and/or populations. Furthermore, few methods address the challenge of classifying selective events according to specific features such as age, intensity or state (completeness). Results: We have developed a machine-learning classification framework that exploits the combined ability of some selection tests to uncover different polymorphism features expected under the hard sweep model, while controlling for population-specific demography. As a result, we achieve high sensitivity toward hard selective sweeps while adding insights about their completeness (whether a selected variant is fixed or not) and age of onset. Our method also determines the relevance of the individual methods implemented so far to detect positive selection under specific selective scenarios. We calibrated and applied the method to three reference human populations from The 1000 Genome Project to generate a genome-wide classification map of hard selective sweeps. This study improves detection of selective sweep by overcoming the classical selection versus no-selection classification strategy, and offers an explanation to the lack of consistency observed among selection tests when applied to real data. Very few signals were observed in the African population studied, while our method presents higher sensitivity in this population demography. Availability and implementation: The genome-wide results for three human populations from The 1000 Genomes Project and an R-package implementing the ‘Hierarchical Boosting’ framework are available at http://hsb.upf.edu/. Contact:jaume.bertranpetit@upf.edu Supplementary information: Supplementary data are available at Bioinformatics online.

This publication has 44 references indexed in Scilit:

An integrated map of genetic variation from 1,092 human genomes
Nature, 2012
Population differentiation as a test for selective sweeps
Genome Research, 2010
Constructing genomic maps of positive selection in humans: Where do we go from here?
Genome Research, 2009
Signals of recent positive selection in a worldwide sample of human populations
Genome Research, 2009
Combining Evidence of Natural Selection with Association Analysis Increases Power to Detect Malaria-Resistance Variants
American Journal of Human Genetics, 2007
Convergent adaptation of human lactase persistence in Africa and Europe
Nature Genetics, 2006
Genomic signatures of positive selection in humans and the limits of outlier approaches
Genome Research, 2006
How reliable are empirical genomic scans for selective sweeps?
Genome Research, 2006
Detecting recent positive selection in the human genome from haplotype structure
Nature, 2002
The Human Genome Browser at UCSC
Genome Research, 2002

Cited by 98 articles