Graph Clustering Via a Discrete Uncoupling Process

Top Cited Papers

1 January 2008

journal article
research article
Published by Society for Industrial & Applied Mathematics (SIAM) in SIAM Journal on Matrix Analysis and Applications

Vol. 30 (1), 121-141
https://doi.org/10.1137/040608635

Abstract

A discrete uncoupling process for finite spaces is introduced, called the Markov Cluster Process or the MCL process. The process is the engine for the graph clustering algorithm called the MCL algorithm. The MCL process takes a stochastic matrix as input, and then alternates expansion and inflation, each step defining a stochastic matrix in terms of the previous one. Expansion corresponds with taking the kth power of a stochastic matrix, where $k\in\N$. Inflation corresponds with a parametrized operator $\Gamma_r$, $r\geq 0$, that maps the set of (column) stochastic matrices onto itself. The image $\Gamma_r M$ is obtained by raising each entry in M to the rth power and rescaling each column to have sum 1 again. In practice the process converges very fast towards a limit that is invariant under both matrix multiplication and inflation, with quadratic convergence around the limit points. The heuristic behind the process is its expected behavior for (Markov) graphs possessing cluster structure. The process is typically applied to the matrix of random walks on a given graph G, and the connected components of (the graph associated with) the process limit generically allow a clustering interpretation of G. The limit is in general extremely sparse and iterands are sparse in a weighted sense, implying that the MCL algorithm is very fast and highly scalable. Several mathematical properties of the MCL process are established. Most notably, the process (and algorithm) iterands posses structural properties generalizing the mapping from process limits onto clusterings. The inflation operator $\Gamma_r$ maps the class of matrices that are diagonally similar to a symmetric matrix onto itself. The phrase diagonally positive semi-definite (dpsd) is used for matrices that are diagonally similar to a positive semi-definite matrix. For $r\in\N$ and for M a stochastic dpsd matrix, the image $\Gamma_r M$ is again dpsd. Determinantal inequalities satisfied by a dpsd matrix M imply a natural ordering among the diagonal elements of M, generalizing the mapping of process limits onto clusterings. The spectrum of $\Gamma_{\infty} M$ is of the form $\{0^{n-k}, 1^k\}$, where k is the number of endclasses of the ordering associated with M, and n is the dimension of M. This attests to the uncoupling effect of the inflation operator.

Keywords

This publication has 34 references indexed in Scilit:

The genome of the social amoeba Dictyostelium discoideum
Nature, 2005
The complete genome sequence of Francisella tularensis, the causative agent of tularemia
Nature Genetics, 2005
Genome evolution in yeasts
Nature, 2004
The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics
PLoS Biology, 2003
Detection of functional modules from protein interaction networks
Proteins, 2003
OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes
Genome Research, 2003
Target Selection and Determination of Function in Structural Genomics
IUBMB Life, 2003
An efficient algorithm for large-scale detection of protein families
Nucleic Acids Research, 2002
Contractive inhomogeneous products of non-negative matrices
Mathematical Proceedings of the Cambridge Philosophical Society, 1979
Diagonal similarity of irreducible matrices to row stochastic matrices
Pacific Journal of Mathematics, 1972

Cited by 401 articles