Population genetic models of GERP scores suggest pervasive turnover of constrained sites across mammalian evolution

Open Access

29 May 2020

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Genetics

Vol. 16 (5), e1008827
https://doi.org/10.1371/journal.pgen.1008827

Abstract

Comparative genomic approaches have been used to identify sites where mutations are under purifying selection and of functional consequence by searching for sequences that are conserved across distantly related species. However, the performance of these approaches has not been rigorously evaluated under population genetic models. Further, short-lived functional elements may not leave a footprint of sequence conservation across many species. We use simulations to study how one measure of conservation, the Genomic Evolutionary Rate Profiling (GERP) score, relates to the strength of selection (N_es). We show that the GERP score is related to the strength of purifying selection. However, changes in selection coefficients or functional elements over time (i.e. functional turnover) can strongly affect the GERP distribution, leading to unexpected relationships between GERP and N_es. Further, we show that for functional elements that have a high turnover rate, adding more species to the analysis does not necessarily increase statistical power. Finally, we use the distribution of GERP scores across the human genome to compare models with and without turnover of sites where mutations under purifying selection. We show that mutations in 4.51% of the noncoding human genome are under purifying selection and that most of this sequence has likely experienced changes in selection coefficients throughout mammalian evolution. Our work reveals limitations to using comparative genomic approaches to identify deleterious mutations. Commonly used GERP score thresholds miss over half of the noncoding sites in the human genome where mutations are under purifying selection. One of the most significant and challenging tasks in modern genomics is to assess the functional consequences of a particular nucleotide change in a genome. A common approach to address this challenge prioritizes sequences that share similar nucleotides across distantly related species, with the rationale that mutations at such positions were deleterious and removed from the population by purifying natural selection. Our manuscript shows that one popular measure of sequence conservation, the GERP score, performs well at identifying selected mutations if mutations at a site were under selection across all of mammalian evolution. Changes in selection at a given site dramatically reduces the power of GERP to detect selected mutations in humans. We also combine population genetic models with the distribution of GERP scores at noncoding sites across the human genome to show that the degree of selection at individual sites has changed throughout mammalian evolution. Importantly, we demonstrate that at least 80 Mb of noncoding sequence under purifying selection in humans will not have extreme GERP scores and will likely be missed by modern comparative genomic approaches. Our work argues that new approaches, potentially based on genetic variation within species, will be required to identify deleterious mutations.

Keywords

Funding Information

National Institute of General Medical Sciences (R35 GM119856)

This publication has 60 references indexed in Scilit:

Comparative genomics as a tool to understand evolution and disease
Genome Research, 2013
On the Immortality of Television Sets: "Function" in the Human Genome According to the Evolution-Free Gospel of ENCODE
Genome Biology and Evolution, 2013
An integrated encyclopedia of DNA elements in the human genome
Nature, 2012
What fraction of the human genome is functional?
Genome Research, 2011
Significant Selective Constraint at 4-Fold Degenerate Sites in the Avian Genome and Its Consequence for Detection of Positive Selection
Genome Biology and Evolution, 2010
Massive turnover of functional sequence in human and other mammalian genomes
Genome Research, 2010
Genome 10K: A Proposal to Obtain Whole-Genome Sequence for 10 000 Vertebrate Species
Journal of Heredity, 2009
Enriching the Analysis of Genomewide Association Studies with Hierarchical Modeling
American Journal of Human Genetics, 2007
Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes
Genome Research, 2005
Initial sequencing and comparative analysis of the mouse genome
Nature, 2002

Cited by 63 articles