Intra-Exon Motif Correlations as a Proxy Measure for Mean Per-Tile Sequence Quality Data in RNA-Seq
- 1 February 2023
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 30 (2), 131-148
- https://doi.org/10.1089/cmb.2021.0476
Abstract
Given the wide variability in the quality of next-generation sequencing data submitted to public repositories, it is essential to identify methods that can perform quality control on these data sets when additional quality control data, such as mean tile data, are missing from public repositories. In this study, we present evidence that correlating counts of reads corresponding to pairs of motifs separated over specific distances on individual exons can be used as a proxy mean tile data in the data sets we analyzed and hence could be used when mean tile data are not available. As test data sets we use the Homo sapiens in vitro transcribed (IVT) data set, and a Drosophila melanogaster data set comprising wild and mutant types. We find that a FastQC analysis of the available parts of these data sets demonstrates that the per-tile sequencing quality is good for all the data sets apart from the mutant-type data where the mutant-r3 data are worse than the mutant-r2 data. Correspondingly, intra-exon motif correlations are reasonably large for all data sets except this latter case where the mutant-r2 correlations are low and the mutant-r3 correlations close to zero. We propose that these extremely low correlations are indicative of bias of technical origin, such as flowcell errors. In addition to this, the intra-exon motif correlations as a function of both guanosine-cytosine (GC) content parameters are somewhat higher and less dependent on the GC content parameters in the IVT-Plasmids messenger RNA (mRNA) selection free RNA-Seq sample (control) than in the other RNA-Seq samples that did undergo mRNA selection: both ribosomal depletion (IVT-Only) and PolyA selection (IVT-PolyA, wild type, and mutant).Keywords
This publication has 38 references indexed in Scilit:
- Identifying Differential Alternative Splicing Events from RNA Sequencing Data Using RNASeq-MATSPublished by Springer Science and Business Media LLC ,2013
- Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome AssemblyPLOS ONE, 2013
- Computational methods for transcriptome annotation and quantification using RNA-seqNature Methods, 2011
- The Sequence Read ArchiveNucleic Acids Research, 2010
- TopHat: discovering splice junctions with RNA-SeqBioinformatics, 2009
- RNA-Seq: a revolutionary tool for transcriptomicsNature Reviews Genetics, 2009
- Next-generation DNA sequencingNature Biotechnology, 2008
- Substantial biases in ultra-short read data sets from high-throughput DNA sequencingNucleic Acids Research, 2008
- Mapping and quantifying mammalian transcriptomes by RNA-SeqNature Methods, 2008
- Gene Expression Omnibus: NCBI gene expression and hybridization array data repositoryNucleic Acids Research, 2002