Improved quality control processing of peptide-centric LC-MS proteomics data

Open Access

18 August 2011

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 27 (20), 2866-2872
https://doi.org/10.1093/bioinformatics/btr479

Abstract

Motivation: In the analysis of differential peptide peak intensities (i.e. abundance measures), LC-MS analyses with poor quality peptide abundance data can bias downstream statistical analyses and hence the biological interpretation for an otherwise high-quality dataset. Although considerable effort has been placed on assuring the quality of the peptide identification with respect to spectral processing, to date quality assessment of the subsequent peptide abundance data matrix has been limited to a subjective visual inspection of run-by-run correlation or individual peptide components. Identifying statistical outliers is a critical step in the processing of proteomics data as many of the downstream statistical analyses [e.g. analysis of variance (ANOVA)] rely upon accurate estimates of sample variance, and their results are influenced by extreme values. Results: We describe a novel multivariate statistical strategy for the identification of LC-MS runs with extreme peptide abundance distributions. Comparison with current method (run-by-run correlation) demonstrates a significantly better rate of identification of outlier runs by the multivariate strategy. Simulation studies also suggest that this strategy significantly outperforms correlation alone in the identification of statistically extreme liquid chromatography-mass spectrometry (LC-MS) runs. Availability:https://www.biopilot.org/docs/Software/RMD.php Contact:bj@pnl.gov Supplementary information: Supplementary material is available at Bioinformatics online.

This publication has 31 references indexed in Scilit:

A recursive version of Grubbs' test for detecting multiple outliers in environmental and chemical data
Clinical Biochemistry, 2010
Mixed-Effects Statistical Model for Comparative LC−MS Proteomics Studies
Journal of Proteome Research, 2008
DESIGN AND ANALYSIS OF QUANTITATIVE DIFFERENTIAL PROTEOMICS INVESTIGATIONS USING LC-MS TECHNOLOGY
Journal of Bioinformatics and Computational Biology, 2008
OutlierD: an R package for outlier detection using quantile regression on mass spectrometry data
Bioinformatics, 2008
Outlier identification in high dimensions
Computational Statistics & Data Analysis, 2008
Estimating probabilities of peptide database identifications to LC-FTICR-MS observations
Proteome Science, 2006
High breakdown estimators for principal components: the projection-pursuit approach revisited
Journal of Multivariate Analysis, 2005
Identification of Outliers
Published by Springer Science and Business Media LLC ,1980
Analysis of Extreme Values
The Annals of Mathematical Statistics, 1950
Sample Criteria for Testing Outlying Observations
The Annals of Mathematical Statistics, 1950

Cited by 86 articles