Feature selection for gene function prediction using multi-labelled lazy learning

Abstract
In multi-label learning, each instance in the training set is associated with a set of labels, and the task is to output a label set whose size is unknown a priori for each unseen instance. In this paper, feature selection for the multi-label method was proposed based on mutual information. In detail, we use the distribution of mutual information for feature selection in the multi-label problems. Our experiment was preceded on a multi-label lazy learning approach named ML-kNN, which is derived from the traditional k-Nearest Neighbour (KNN) algorithm. Experimental results on a real-world multi-label bioinformatics data show that ML-kNN with feature selection greatly outperforms the prior ML-kNN algorithm.