Identification and characterization of constrained non-exonic bases lacking predictive epigenomic and transcription factor binding annotations
Open Access
- 1 December 2020
- journal article
- research article
- Published by Springer Science and Business Media LLC in Nature Communications
- Vol. 11 (1), 1-16
- https://doi.org/10.1038/s41467-020-19962-9
Abstract
Annotations of evolutionary sequence constraint based on multi-species genome alignments and genome-wide maps of epigenomic marks and transcription factor binding provide important complementary information for understanding the human genome and genetic variation. Here we developed the Constrained Non-Exonic Predictor (CNEP) to quantify the evidence of each base in the genome being in an evolutionarily constrained non-exonic element from an input of over 60,000 epigenomic and transcription factor binding features. We find that the CNEP score outperforms baseline and related existing scores at predicting evolutionarily constrained non-exonic bases from such data. However, a subset of them are still not well predicted by CNEP. We developed a complementary Conservation Signature Score by CNEP (CSS-CNEP) that is predictive of those bases. We further characterize the nature of constrained non-exonic bases with low CNEP scores using additional types of information. CNEP and CSS-CNEP are resources for analyzing constrained non-exonic bases in the genome. Genome-wide maps of evolutionary constraint and large-scale compendia of epigenomic and transcription factor data provide complementary information for genome annotation. Here, the authors develop the Constrained Non-Exonic Predictor (CNEP) that enables better understanding of their relationship.Funding Information
- U.S. Department of Health & Human Services | National Institutes of Health (U01MH105578, T32CA201160, R35GM119856)
- U.S. Department of Health & Human Services | National Institutes of Health
- U.S. Department of Health & Human Services | National Institutes of Health
- National Science Foundation (1254200)
- Alfred P. Sloan Foundation
This publication has 46 references indexed in Scilit:
- Ch IP ‐Atlas: a data‐mining suite powered by full integration of public Ch IP ‐seq dataEMBO Reports, 2018
- ReMap 2018: an updated atlas of regulatory regions from an integrative analysis of DNA-binding ChIP-seq experimentsNucleic Acids Research, 2017
- Partitioning heritability by functional annotation using genome-wide association summary statisticsNature Genetics, 2015
- Integrative analysis of 111 reference human epigenomesNature, 2015
- An integrated encyclopedia of DNA elements in the human genomeNature, 2012
- A high-resolution map of human evolutionary constraint using 29 mammalsNature, 2011
- Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++PLoS Computational Biology, 2010
- Potential etiologic and functional implications of genome-wide association loci for human diseases and traitsProceedings of the National Academy of Sciences of the United States of America, 2009
- Identifying novel constrained elements by exploiting biased substitution patternsBioinformatics, 2009
- Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomesGenome Research, 2005