Integrative disease classification based on cross-platform microarray data
Open Access
- 30 January 2009
- journal article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 10 (S1), S25
- https://doi.org/10.1186/1471-2105-10-s1-s25
Abstract
Background Disease classification has been an important application of microarray technology. However, most microarray-based classifiers can only handle data generated within the same study, since microarray data generated by different laboratories or with different platforms can not be compared directly due to systematic variations. This issue has severely limited the practical use of microarray-based disease classification. Results In this study, we tested the feasibility of disease classification by integrating the large amount of heterogeneous microarray datasets from the public microarray repositories. Cross-platform data compatibility is created by deriving expression log-rank ratios within datasets. One may then compare vectors of log-rank ratios across datasets. In addition, we systematically map textual annotations of datasets to concepts in Unified Medical Language System (UMLS), permitting quantitative analysis of the phenotype "distance" between datasets and automated construction of disease classes. We design a new classification approach named ManiSVM, which integrates Manifold data transformation with SVM learning to exploit the data properties. Using the leave one dataset out cross validation, ManiSVM achieved the overall accuracy of 70.7% (68.6% precision and 76.9% recall) with many disease classes achieving the accuracy higher than 80%. Conclusion Our results not only demonstrated the feasibility of the integrated disease classification approach, but also showed that the classification accuracy increases with the number of homogenous training datasets. Thus, the power of the integrative approach will increase with the continuous accumulation of microarray data in public repositories. Our study shows that automated disease diagnosis can be an important and promising application of the enormous amount of costly to generate, yet freely available, public microarray data.Keywords
This publication has 19 references indexed in Scilit:
- Cross-generation and cross-laboratory predictions of Affymetrix microarrays by rank-based methodsJournal of Biomedical Informatics, 2008
- Gene Expression-Based Molecular Diagnostic System for Malignant Gliomas Is Superior to Histological DiagnosisClinical Cancer Research, 2007
- A Five-Gene Signature and Clinical Outcome in Non–Small-Cell Lung CancerNew England Journal of Medicine, 2007
- Classification of Breast Cancer Using Genetic Algorithms and Tissue MicroarraysClinical Cancer Research, 2006
- Cross-platform classification in microarray-based leukemia diagnostics.2006
- Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypesBMC Bioinformatics, 2005
- Comparison of the Predictive Accuracy of DNA Array-Based Multigene Classifiers across cDNA Arrays and Affymetrix GeneChipsThe Journal of Molecular Diagnostics, 2005
- Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progressionProceedings of the National Academy of Sciences of the United States of America, 2004
- Multi-Platform, Multi-Site, Microarray-Based Human Tumor ClassificationThe American Journal of Pathology, 2004
- Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression MonitoringScience, 1999