Specific identification and quantification of circular RNAs from sequencing data

Abstract
Motivation: Circular RNAs (circRNAs) are a poorly characterized class of molecules that have been identified decades ago. Emerging high-throughput sequencing methods as well as first reports on confirmed functions have sparked new interest in this RNA species. However, the computational detection and quantification tools are still limited. Results: We developed the software tandem, DCC and CircTest. DCC uses output from the STAR read mapper to systematically detect back-splice junctions in next-generation sequencing data. DCC applies a series of filters and integrates data across replicate sets to arrive at a precise list of circRNA candidates. We assessed the detection performance of DCC on a newly generated mouse brain data set and publicly available sequencing data. Our software achieves a much higher precision than state-of-the-art competitors at similar sensitivity levels. Moreover, DCC estimates circRNA versus host gene expression from counting junction and non-junction reads. These read counts are finally used to test for host gene-independence of circRNA expression across different experimental conditions by our R package CircTest. We demonstrate the benefits of this approach on previously reported age-dependent circRNAs in the fruit fly. Availability and implementation: The source code of DCC and CircTest is licensed under the GNU General Public Licence (GPL) version 3 and available from https://github.com/dieterich-lab/[DCC or CircTest]. Contact:christoph.dieterich@age.mpg.de Supplementary information: Supplementary data are available at Bioinformatics online.