Assessing Effects of Pre-Processing Mass Spectrometry Data on Classification Performance

1 April 2008

journal article
Published by SAGE Publications in European Journal of Mass Spectrometry

Vol. 14 (5), 267-273
https://doi.org/10.1255/ejms.938

Abstract

Disease prediction through mass spectrometry (MS) data is gaining importance in medical diagnosis. Particularly in cancerous diseases, early prediction is one of the most life saving stages. High dimension and the noisy nature of MS data requires a two-phase study for successful disease prediction; first, MS data must be pre-processed with stages such as baseline correction, normalizing, de-noising and peak detection. Second, a dimension reduction based classifier design is the main objective. Having the data pre-processed, the prediction accuracy of the classifier algorithm becomes the most significant factor in the medical diagnosis phase. As health is the main concern, the accuracy of the classifier is clearly very important. In this study, the effects of the pre-processing stages of MS data on classifier performances are addressed. Three pre-processing stages—baseline correction, normalization and de-noising—are applied to three MS data samples, namely, high-resolution ovarian cancer, low-resolution prostate cancer and a low-resolution ovarian cancer. To measure the effects of the pre-processing stages quantitatively, four diverse classifiers, genetic algorithm wrapped K-nearest neighbor (GA-KNN), principal component analysis-based least discriminant analysis (PCA-LDA), a neural network (NN) and a support vector machine (SVM) are applied to the data sets. Calculated classifier performances have demonstrated the effects of pre-processing stages quantitatively and the importance of pre-processing stages on the prediction accuracy of classifiers. Results of computations have been shown clearly.

Keywords

This publication has 10 references indexed in Scilit:

Computational Methods for Protein Identification from Mass Spectrometry Data
PLoS Computational Biology, 2008
Approaches to dimensionality reduction in proteomic biomarker studies
Briefings in Bioinformatics, 2007
Data mining in proteomic mass spectrometry
Clinical Proteomics, 2006
Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum
Bioinformatics, 2005
Use of Tandem Mass Spectrometry for Multianalyte Screening of Dried Blood Specimens from Newborns
Clinical Chemistry, 2003
Tandem mass spectrometry in the clinical chemistry laboratory
Clinical Biochemistry, 2003
Clinical Applications of Proteomics: Proteomic Pattern Diagnostics
Journal of Mammary Gland Biology and Neoplasia, 2002
Test methods: anabolics
Best Practice & Research Clinical Endocrinology & Metabolism, 2000
Mass spectrometry: from genomics to proteomics
Trends in Genetics, 2000
Proteomics: quantitative and physical mapping of cellular proteins
Trends in Biotechnology, 1999

Cited by 10 articles