Cancer Classification using Support Vector Machines and Relevance Vector Machine based on Analysis of Variance Features

Abstract
Problem statement: The objective of this study is, to find the smallest set of genes that can ensure highly accurate classification of cancer from micro array data by using supervised machine learning algorithms. The significance of finding the minimum subset is three fold: The computational burden and noise arising from irrelevant genes are much reduced; the cost for cancer testing is reduced significantly as it simplifies the gene expression tests to include only a very small number of genes rather than thousands of genes; it calls for more investigation into the probable biological relationship between these small numbers of genes and cancer development and treatment. Approach: The proposed method involves two steps. In the first step, some important genes were chosen with the help of Analysis of Variance (ANOVA) ranking scheme. In the second step, the classification capability was tested for all simple combinations of those important genes using a better classifier. Results: The proposed method initially uses Support Vector Machine (SVM) classifier. Relevance Vector Machine (RVM) classifier was used for increasing the classification accuracy over SVM classifier. Conclusion: The experimental result shows that the proposed method performs the cancer classification with better accuracy when compared to the conventional methods