Abstract
Selective sweeps, the genetic footprint of positive selection, have been extensively studied in the past decades, with dozens of methods developed to identify swept regions. However, these methods suffer from both false positive and false negative reports, and the candidates identified with different methods are often inconsistent with each other. We propose that a biological cause of this problem can be population subdivision, and a technical cause can be incomplete, or inaccurate, modeling of the dynamic process associated with sweeps. Here we used simulations to show how these effects interact and potentially cause bias. In particular, we show that sweeps maybe misclassified as either hard or soft, when the true time stage of a sweep and that implied, or pre-supposed, by the model do not match. We call this “temporal misclassification”. Similarly, “spatial misclassification (softening)” can occur when hard sweeps, which are imported by migration into a new subpopulation, are falsely identified as soft. This can easily happen in case of local adaptation, i.e. when the sweeping allele is not under positive selection in the new subpopulation, and the underlying model assumes panmixis instead of substructure. The claim that most sweeps in the evolutionary history of humans were soft, may have to be reconsidered in the light of these findings. Identifying the traces of adaptive evolution is still difficult, in particular when populations are not in equilibrium. Using forward-in-time simulations, we studied adaptation by selective sweeps in populations that are divided into demes with limited migration among them. We applied different sweep tests, whose sensitivities are found to vary widely across demographic scenarios and temporal stages. First, the temporal stage of a sweep (ongoing vs completed) significantly affects detection, especially when machine learning algorithms are used and training and test stages do not match. Second, imported alleles from a neighboring deme with local adaptation can result in spurious sweep signals. In both cases, signals are often detected as “soft sweeps” (adaptation from standing variation) while in fact they are “hard sweeps” (adaptation from single mutation), originating in the same subpopulation in the former case and in some other subpopulation in the latter case. We call these phenomena “temporal” and “spatial softening”. Finally, under low migration, the time window in which a sweep can be detected becomes very narrow, and power tends to be low. Generally, however, haplotype-based methods seem to be less affected than frequency-spectrum-based tests.
Funding Information
  • Deutsche Forschungsgemeinschaft (SFB1211)
  • Deutsche Forschungsgemeinschaft (SPP1590)