Classification and feature selection algorithms for multi-class CGH data
Open Access
- 1 July 2008
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 24 (13), i86-i95
- https://doi.org/10.1093/bioinformatics/btn145
Abstract
Recurrent chromosomal alterations provide cytological and molecular positions for the diagnosis and prognosis of cancer. Comparative genomic hybridization (CGH) has been useful in understanding these alterations in cancerous cells. CGH datasets consist of samples that are represented by large dimensional arrays of intervals. Each sample consists of long runs of intervals with losses and gains. In this article, we develop novel SVM-based methods for classification and feature selection of CGH data. For classification, we developed a novel similarity kernel that is shown to be more effective than the standard linear kernel used in SVM. For feature selection, we propose a novel method based on the new kernel that iteratively selects features that provides the maximum benefit for classification. We compared our methods against the best wrapper-based and filter-based approaches that have been used for feature selection of large dimensional biological data. Our results on datasets generated from the Progenetix database, suggests that our methods are considerably superior to existing methods. Availability: All software developed in this article can be downloaded from http://plaza.ufl.edu/junliu/feature.tar.gz Contact:juliu@cise.ufl.eduThis publication has 27 references indexed in Scilit:
- Training a Support Vector Machine in the PrimalNeural Computation, 2007
- Distance-based clustering of CGH dataBioinformatics, 2006
- Array comparative genomic hybridization and its applications in cancerNature Genetics, 2005
- Unequivocal Delineation of Clinicogenetic Subgroups and Development of a New Model for Improved Outcome Prediction in NeuroblastomaJournal of Clinical Oncology, 2005
- Statistical behavior of complex cancer karyotypesGenes, Chromosomes and Cancer, 2005
- Cancer Statistics, 2005CA: A Cancer Journal for Clinicians, 2005
- A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expressionBioinformatics, 2004
- Inferring Tree Models for Oncogenesis from Comparative Genome Hybridization DataJournal of Computational Biology, 1999
- Comparative Genomic Hybridization for Molecular Cytogenetic Analysis of Solid TumorsScience, 1992
- Tumor Etiology and Chromosome PatternScience, 1972