Application of Fuzzy c-Means Clustering in Data Analysis of Metabolomics
- 1 May 2009
- journal article
- Published by American Chemical Society (ACS) in Analytical Chemistry
- Vol. 81 (11), 4468-4475
- https://doi.org/10.1021/ac900353t
Abstract
Fuzzy c-means (FCM) clustering is an unsupervised method derived from fuzzy logic that is suitable for solving multiclass and ambiguous clustering problems. In this study, FCM clustering is applied to cluster metabolomics data. FCM is performed directly on the data matrix to generate a membership matrix which represents the degree of association the samples have with each cluster. The method is parametrized with the number of clusters (C) and the fuzziness coefficient (m), which denotes the degree of fuzziness in the algorithm. Both have been optimized by combining FCM with partial least-squares (PLS) using the membership matrix as the Y matrix in the PLS model. The quality parameters R2Y and Q2 of the PLS model have been used to monitor and optimize C and m. Data of metabolic profiles from three gene types of Escherichia coli were used to demonstrate the method above. Different multivariable analysis methods have been compared. Principal component analysis failed to model the metabolite data, while partial least-squares discriminant analysis yielded results with overfitting. On the basis of the optimized parameters, the FCM was able to reveal main phenotype changes and individual characters of three gene types of E. coli. Coupled with PLS, FCM provides a powerful research tool for metabolomics with improved visualization, accurate classification, and outlier estimation.Keywords
This publication has 24 references indexed in Scilit:
- Chemometrics in MetabonomicsJournal of Proteome Research, 2006
- Mass spectrometry‐based metabolomicsMass Spectrometry Reviews, 2006
- Statistical Search Space Reduction and Two-Dimensional Data Display Approaches for UPLC−MS in Biomarker Discovery and Pathway AnalysisAnalytical Chemistry, 2006
- Symbiosis of chemometrics and metabolomics: past, present, and futureJournal of Chemometrics, 2005
- Statistical Total Correlation Spectroscopy: An Exploratory Approach for Latent Biomarker Identification from Metabolic 1H NMR Data SetsAnalytical Chemistry, 2005
- Differential metabolic networks unravel the effects of silent plant phenotypesProceedings of the National Academy of Sciences of the United States of America, 2004
- Rapid and noninvasive diagnosis of the presence and severity of coronary heart disease using 1H-NMR-based metabonomicsNature Medicine, 2002
- Metabonomics: a platform for studying drug toxicity and gene functionNature Reviews Drug Discovery, 2002
- The use and misuse of chemometrics for treating classification problemsTrAC Trends in Analytical Chemistry, 1997
- FCM: The fuzzy c-means clustering algorithmComputers & Geosciences, 1984