Discovering Relevance-Dependent Bicluster Structure from Relational Data: A Model and Algorithm

Abstract
We propose a statistical model for relevance-dependent biclustering to analyze relational data. The proposed model factorizes relational data into bicluster structure with two features: (1) each object in a cluster has a relevance value, which indicates how strongly the object relates to the cluster and (2) all clusters are related to at least one dense block. These features simplify the task of understanding the meaning of each cluster because only a few highly relevant objects need to be inspected. We introduced the Relevance-Dependent Bernoulli Distribution (R-BD) as a prior for relevance-dependent binary matrices and proposed the novel Relevance-Dependent Infinite Biclustering (R-IB) model, which automatically estimates the number of clusters. Posterior inference can be performed efficiently using a collapsed Gibbs sampler because the parameters of the R-IB model can be fully marginalized out. Experimental results show that the R-IB extracts more essential bicluster structure with better computational efficiency than conventional models. We further observed that the biclustering results obtained by R-IB facilitate interpretation of the meaning of each cluster.

This publication has 19 references indexed in Scilit: