Unsupervised contrastive peak caller for ATAC-seq

Open Access

22 May 2023

journal article
research article
Published by Cold Spring Harbor Laboratory in Genome Research

Vol. 33 (7), 1133-1144
https://doi.org/10.1101/gr.277677.123

Abstract

The assay for transposase-accessible chromatin with sequencing (ATAC-seq) is a common assay to identify chromatin accessible regions by using a Tn5 transposase that can access, cut, and ligate adapters to DNA fragments for subsequent amplification and sequencing. These sequenced regions are quantified and tested for enrichment in a process referred to as “peak calling.” Most unsupervised peak calling methods are based on simple statistical models and suffer from elevated false positive rates. Newly developed supervised deep learning methods can be successful, but they rely on high quality labeled data for training, which can be difficult to obtain. Moreover, though biological replicates are recognized to be important, there are no established approaches for using replicates in the deep learning tools, and the approaches available for traditional methods either cannot be applied to ATAC-seq, where control samples may be unavailable, or are post hoc and do not capitalize on potentially complex, but reproducible signal in the read enrichment data. Here, we propose a novel peak caller that uses unsupervised contrastive learning to extract shared signals from multiple replicates. Raw coverage data are encoded to obtain low-dimensional embeddings and optimized to minimize a contrastive loss over biological replicates. These embeddings are passed to another contrastive loss for learning and predicting peaks and decoded to denoised data under an autoencoder loss. We compared our replicative contrastive learner (RCL) method with other existing methods on ATAC-seq data, using annotations from ChromHMM genomic labels and transcription factor ChIP-seq as noisy truth. RCL consistently achieved the best performance.

Funding Information

Eunice Kennedy Shriver National Institute of Child Health & Human Development
National Institutes of Health (R01HD096083)
United States Department of Agriculture
National Institute of Food and Agriculture (IOW03717)

This publication has 50 references indexed in Scilit:

An integrated encyclopedia of DNA elements in the human genome
Nature, 2012
BEDOPS: high-performance genomic feature operations
Bioinformatics, 2012
Fast gapped-read alignment with Bowtie 2
Nature Methods, 2012
A manually curated ChIP-seq benchmark demonstrates room for improvement in current peak-finder programs
Nucleic Acids Research, 2010
Measuring dementia carers' unmet need for services - an exploratory mixed method study
BMC Health Services Research, 2010
BEDTools: a flexible suite of utilities for comparing genomic features
Bioinformatics, 2010
ChIP–seq: advantages and challenges of a maturing technology
Nature Reviews Genetics, 2009
Model-based Analysis of ChIP-Seq (MACS)
Genome Biology, 2008
Development of Structures and Transport Functions in the Mouse Placenta
Physiology, 2005
Cyclosporin A promotes spontaneous outgrowth in vitro of Epstein–Barr virus-induced B-cell lines
Nature, 1981