Application of Fuzzy c-Means Clustering in Data Analysis of Metabolomics

1 May 2009

journal article
Published by American Chemical Society (ACS) in Analytical Chemistry

Vol. 81 (11), 4468-4475
https://doi.org/10.1021/ac900353t

Abstract

Fuzzy c-means (FCM) clustering is an unsupervised method derived from fuzzy logic that is suitable for solving multiclass and ambiguous clustering problems. In this study, FCM clustering is applied to cluster metabolomics data. FCM is performed directly on the data matrix to generate a membership matrix which represents the degree of association the samples have with each cluster. The method is parametrized with the number of clusters (C) and the fuzziness coefficient (m), which denotes the degree of fuzziness in the algorithm. Both have been optimized by combining FCM with partial least-squares (PLS) using the membership matrix as the Y matrix in the PLS model. The quality parameters R²Y and Q² of the PLS model have been used to monitor and optimize C and m. Data of metabolic profiles from three gene types of Escherichia coli were used to demonstrate the method above. Different multivariable analysis methods have been compared. Principal component analysis failed to model the metabolite data, while partial least-squares discriminant analysis yielded results with overfitting. On the basis of the optimized parameters, the FCM was able to reveal main phenotype changes and individual characters of three gene types of E. coli. Coupled with PLS, FCM provides a powerful research tool for metabolomics with improved visualization, accurate classification, and outlier estimation.

Keywords

This publication has 24 references indexed in Scilit:

Chemometrics in Metabonomics
Journal of Proteome Research, 2006
Mass spectrometry‐based metabolomics
Mass Spectrometry Reviews, 2006
Statistical Search Space Reduction and Two-Dimensional Data Display Approaches for UPLC−MS in Biomarker Discovery and Pathway Analysis
Analytical Chemistry, 2006
Symbiosis of chemometrics and metabolomics: past, present, and future
Journal of Chemometrics, 2005
Statistical Total Correlation Spectroscopy: An Exploratory Approach for Latent Biomarker Identification from Metabolic ¹H NMR Data Sets
Analytical Chemistry, 2005
Differential metabolic networks unravel the effects of silent plant phenotypes
Proceedings of the National Academy of Sciences of the United States of America, 2004
Rapid and noninvasive diagnosis of the presence and severity of coronary heart disease using 1H-NMR-based metabonomics
Nature Medicine, 2002
Metabonomics: a platform for studying drug toxicity and gene function
Nature Reviews Drug Discovery, 2002
The use and misuse of chemometrics for treating classification problems
TrAC Trends in Analytical Chemistry, 1997
FCM: The fuzzy c-means clustering algorithm
Computers & Geosciences, 1984

Cited by 56 articles