Multiple imputation and direct estimation for qPCR data with non-detects
Open Access
- 26 December 2020
- journal article
- research article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 21 (1), 1-15
- https://doi.org/10.1186/s12859-020-03807-9
Abstract
Background Quantitative real-time PCR (qPCR) is one of the most widely used methods to measure gene expression. An important aspect of qPCR data that has been largely ignored is the presence of non-detects: reactions failing to exceed the quantification threshold and therefore lacking a measurement of expression. While most current software replaces these non-detects with a value representing the limit of detection, this introduces substantial bias in the estimation of both absolute and differential expression. Single imputation procedures, while an improvement on previously used methods, underestimate residual variance, which can lead to anti-conservative inference. Results We propose to treat non-detects as non-random missing data, model the missing data mechanism, and use this model to impute missing values or obtain direct estimates of model parameters. To account for the uncertainty inherent in the imputation, we propose a multiple imputation procedure, which provides a set of plausible values for each non-detect. We assess the proposed methods via simulation studies and demonstrate the applicability of these methods to three experimental data sets. We compare our methods to mean imputation, single imputation, and a penalized EM algorithm incorporating non-random missingness (PEMM). The developed methods are implemented in the R/Bioconductor package nondetects. Conclusions The statistical methods introduced here reduce discrepancies in gene expression values derived from qPCR experiments in the presence of non-detects, providing increased confidence in downstream analyses.Keywords
Funding Information
- National Human Genome Research Institute (R00HG006853)
- National Cancer Institute (CA138249, CA197562)
- National Center for Advancing Translational Sciences (UL1TR002001)
This publication has 24 references indexed in Scilit:
- Data exploration, quality control and testing in single-cell qPCR-based gene expression experimentsBioinformatics, 2012
- Fitting Boolean Networks from Steady State Perturbation DataStatistical Applications in Genetics and Molecular Biology, 2011
- Differential expression analysis for sequence count dataGenome Biology, 2010
- Tackling the widespread and critical impact of batch effects in high-throughput dataNature Reviews Genetics, 2010
- edgeR: a Bioconductor package for differential expression analysis of digital gene expression dataBioinformatics, 2009
- RDML: structured language and reporting guidelines for real-time quantitative PCR dataNucleic Acids Research, 2009
- Synergistic response to oncogenic mutations defines gene class critical to cancer phenotypeNature, 2008
- Highly accurate sigmoidal fitting of real-time PCR data by introducing a parameter for asymmetryBMC Bioinformatics, 2008
- A new mathematical model for relative quantification in real-time RT-PCRNucleic Acids Research, 2001
- On the mathematical foundations of theoretical statisticsPhilosophical Transactions of the Royal Society A, 1922