Systematic analysis of binding of transcription factors to noncoding variants
- 26 January 2021
- journal article
- research article
- Published by Springer Science and Business Media LLC in Nature
- Vol. 591 (7848), 147-+
- https://doi.org/10.1038/s41586-021-03211-0
Abstract
Many sequence variants have been linked to complex human traits and diseases(1), but deciphering their biological functions remains challenging, as most of them reside in noncoding DNA. Here we have systematically assessed the binding of 270 human transcription factors to 95,886 noncoding variants in the human genome using an ultra-high-throughput multiplex protein-DNA binding assay, termed single-nucleotide polymorphism evaluation by systematic evolution of ligands by exponential enrichment (SNP-SELEX). The resulting 828 million measurements of transcription factor-DNA interactions enable estimation of the relative affinity of these transcription factors to each variant in vitro and evaluation of the current methods to predict the effects of noncoding variants on transcription factor binding. We show that the position weight matrices of most transcription factors lack sufficient predictive power, whereas the support vector machine combined with the gapped k-mer representation show much improved performance, when assessed on results from independent SNP-SELEX experiments involving a new set of 61,020 sequence variants. We report highly predictive models for 94 human transcription factors and demonstrate their utility in genome-wide association studies and understanding of the molecular pathways involved in diverse human traits and diseases.This publication has 72 references indexed in Scilit:
- An integrated encyclopedia of DNA elements in the human genomeNature, 2012
- The accessible chromatin landscape of the human genomeNature, 2012
- Large-scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathwaysNature Genetics, 2012
- A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistanceNature Genetics, 2012
- A framework for variation discovery and genotyping using next-generation DNA sequencing dataNature Genetics, 2011
- Integrating common and rare genetic variation in diverse human populationsNature, 2010
- Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiationNature Biotechnology, 2010
- Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell IdentitiesMolecular Cell, 2010
- Multiple common variants for celiac disease influencing immune gene expressionNature Genetics, 2010
- The role of DNA shape in protein–DNA recognitionNature, 2009