High Confidence Rule Mining for Microarray Analysis

Abstract

We present an association rule mining method for mining high-confidence rules, which describe interesting gene relationships from microarray data sets. Microarray data sets typically contain an order of magnitude more genes than experiments, rendering many data mining methods impractical as they are optimized for sparse data sets. A new family of row-enumeration rule mining algorithms has emerged to facilitate mining in dense data sets. These algorithms rely on pruning infrequent relationships to reduce the search space by using the support measure. This major shortcoming results in the pruning of many potentially interesting rules with low support but high confidence. We propose a new row-enumeration rule mining method, MaxConf, to mine high-confidence rules from microarray data. MAXCONF is a support-free algorithm that directly uses the confidence measure to effectively prune the search space. Experiments on three microarray data sets show that MaxConf outperforms support-based rule mining with respect to scalability and rule extraction. Furthermore, detailed biological analyses demonstrate the effectiveness of our approach-the rules discovered by MaxConf are substantially more interesting and meaningful compared with support-based methods.

Keywords

This publication has 23 references indexed in Scilit:

Mining top-K covering rule groups for gene expression data
Published by Association for Computing Machinery (ACM) ,2005
Efficient algorithms for mining closed itemsets and their lattice structure
IEEE Transactions on Knowledge and Data Engineering, 2005
The Biomolecular Interaction Network Database and related tools 2005 update
Nucleic Acids Research, 2004
Cluster analysis for gene expression data: a survey
IEEE Transactions on Knowledge and Data Engineering, 2004
FARMER
Published by Association for Computing Machinery (ACM) ,2004
The Gene Ontology (GO) database and informatics resource
Nucleic Acids Research, 2004
Carpenter
Published by Association for Computing Machinery (ACM) ,2003
Gene expression correlates of clinical prostate cancer behavior
Cancer Cell, 2002
Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes
Molecular Biology of the Cell, 2000
Mining association rules between sets of items in large databases
Published by Association for Computing Machinery (ACM) ,1993

Cited by 41 articles