GraphCrunch: A tool for large network analyses

Abstract
Background: The recent explosion in biological and other real-world network data has created the need for improved tools for large network analyses. In addition to well establishedglobalnetwork properties, several new mathematical techniques for analyzinglocalstructural properties of large networks have been developed. Small over-represented subgraphs, called networkmotifs, have been introduced to identify simple building blocks of complex networks. Small induced subgraphs, calledgraphlets, have been used to develop "network signatures" that summarize network topologies. Based on these network signatures, two new highly sensitive measures of network local structural similarities were designed: therelative graphlet frequency distance(RGF-distance) and thegraphlet degree distribution agreement(GDD-agreement).Finding adequate null-models for biological networks is important in many research domains. Network properties are used to assess the fit of network models to the data. Various network models have been proposed. To date, there does not exist a software tool that measures the above mentioned local network properties. Moreover, none of the existing tools compare real-world networks against a series of network models with respect to these local as well as a multitude of global network properties.Results: Thus, we introduce GraphCrunch, a software tool that finds well-fitting network models by comparing large real-world networks against random graph models according to various network structural similarity measures. It has unique capabilities of finding computationally expensive RGF-distance and GDD-agreement measures. In addition, it computes several standard global network measures and thus supports the largest variety of network measures thus far. Also, it is the first software tool that compares real-world networks against a series of network models and that has built-in parallel computing capabilities allowing for a user specified list of machines on which to perform compute intensive searches for local network properties. Furthermore, GraphCrunch is easily extendible to include additional network measures and models.Conclusion: GraphCrunch is a software tool that implements the latest research on biological network models and properties: it compares real-world networks against a series of random graph models with respect to a multitude of local and global network properties. We present GraphCrunch as a comprehensive, parallelizable, and easily extendible software tool for analyzing and modeling large biological networks. The software is open-source and freely available athttp://www.ics.uci.edu/~bio-nets/graphcrunch/. It runs under Linux, MacOS, and Windows Cygwin. In addition, it has an easy to use on-line web user interface that is available from the above web page.