A robust statistical procedure to discover expression biomarkers using microarray genomic expression data

Abstract
Microarray has become increasingly popular biotechnology in biological and medical researches, and has been widely applied in classification of treatment subtypes using expression patterns of biomarkers. We developed a statistical procedure to identify expression biomarkers for treatment subtype classification by constructing an F-statistic based on Henderson method III. Monte Carlo simulations were conducted to examine the robustness and efficiency of the proposed method. Simulation results showed that our method could provide satisfying power of identifying differentially expressed genes (DEGs) with false discovery rate (FDR) lower than the given type I error rate. In addition, we analyzed a leukemia dataset collected from 38 leukemia patients with 27 samples diagnosed as acute lymphoblastic leukemia (ALL) and 11 samples as acute myeloid leukemia (AML). We compared our results with those from the methods of significance analysis of microarray (SAM) and microarray analysis of variance (MAANOVA). Among these three methods, only expression biomarkers identified by our method can precisely identify the three human acute leukemia subtypes.