Graph Clustering Via a Discrete Uncoupling Process
Top Cited Papers
- 1 January 2008
- journal article
- research article
- Published by Society for Industrial & Applied Mathematics (SIAM) in SIAM Journal on Matrix Analysis and Applications
- Vol. 30 (1), 121-141
- https://doi.org/10.1137/040608635
Abstract
A discrete uncoupling process for finite spaces is introduced, called the Markov Cluster Process or the MCL process. The process is the engine for the graph clustering algorithm called the MCL algorithm. The MCL process takes a stochastic matrix as input, and then alternates expansion and inflation, each step defining a stochastic matrix in terms of the previous one. Expansion corresponds with taking the kth power of a stochastic matrix, where $k\in\N$. Inflation corresponds with a parametrized operator $\Gamma_r$, $r\geq 0$, that maps the set of (column) stochastic matrices onto itself. The image $\Gamma_r M$ is obtained by raising each entry in M to the rth power and rescaling each column to have sum 1 again. In practice the process converges very fast towards a limit that is invariant under both matrix multiplication and inflation, with quadratic convergence around the limit points. The heuristic behind the process is its expected behavior for (Markov) graphs possessing cluster structure. The process is typically applied to the matrix of random walks on a given graph G, and the connected components of (the graph associated with) the process limit generically allow a clustering interpretation of G. The limit is in general extremely sparse and iterands are sparse in a weighted sense, implying that the MCL algorithm is very fast and highly scalable. Several mathematical properties of the MCL process are established. Most notably, the process (and algorithm) iterands posses structural properties generalizing the mapping from process limits onto clusterings. The inflation operator $\Gamma_r$ maps the class of matrices that are diagonally similar to a symmetric matrix onto itself. The phrase diagonally positive semi-definite (dpsd) is used for matrices that are diagonally similar to a positive semi-definite matrix. For $r\in\N$ and for M a stochastic dpsd matrix, the image $\Gamma_r M$ is again dpsd. Determinantal inequalities satisfied by a dpsd matrix M imply a natural ordering among the diagonal elements of M, generalizing the mapping of process limits onto clusterings. The spectrum of $\Gamma_{\infty} M$ is of the form $\{0^{n-k}, 1^k\}$, where k is the number of endclasses of the ordering associated with M, and n is the dimension of M. This attests to the uncoupling effect of the inflation operator.
Keywords
This publication has 34 references indexed in Scilit:
- The genome of the social amoeba Dictyostelium discoideumNature, 2005
- The complete genome sequence of Francisella tularensis, the causative agent of tularemiaNature Genetics, 2005
- Genome evolution in yeastsNature, 2004
- The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative GenomicsPLoS Biology, 2003
- Detection of functional modules from protein interaction networksProteins, 2003
- OrthoMCL: Identification of Ortholog Groups for Eukaryotic GenomesGenome Research, 2003
- Target Selection and Determination of Function in Structural GenomicsIUBMB Life, 2003
- An efficient algorithm for large-scale detection of protein familiesNucleic Acids Research, 2002
- Contractive inhomogeneous products of non-negative matricesMathematical Proceedings of the Cambridge Philosophical Society, 1979
- Diagonal similarity of irreducible matrices to row stochastic matricesPacific Journal of Mathematics, 1972