Sampling algorithms for pure network topologies
- 1 December 2005
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGKDD Explorations Newsletter
- Vol. 7 (2), 13-22
- https://doi.org/10.1145/1117454.1117457
Abstract
In a time of information glut, observations about complex systems and phenomena of interest are available in several applications areas, such as biology and text. As a consequence, scientists have started searching for patterns that involve interactions among the objects of analysis, to the effect that research on models and algorithms for network analysis has become a central theme for knowledge discovery and data mining (KDD). The intuitions behind the plethora of approaches rely upon few basic types of networks, identified by specific local and global topological properties, which we term "pure" topology types.In this paper, (1) we survey pure topology types along with existing sampling algorithms that generate them, (2) we introduce novel algorithms that enhance the diversity of samples, and address the case of cellular topologies, (3) we perform statistical studies of the stability of the properties of pure types to alternative generative algorithms, and a joint study of the separability of pure types, in terms of their embedding in a space of metrics for network analysis, widely adopted in the social and physical sciences.We conclude with a word of caution to the practitioners, who sample pure topology types to assess the "statistical significance" of their findings, e.g., the p-value of the clustering coefficient is sensitive to the sampling algorithm used. We find that different pure types share similar topological properties. Further, real world networks hardly present the variability profile of a single pure type. We suggest the assumption of "mixtures of types" as an alternative starting point for developing models and algorithms for network analysis.Keywords
This publication has 28 references indexed in Scilit:
- How to search a social networkSocial Networks, 2005
- Graph-based relational learningACM SIGKDD Explorations Newsletter, 2003
- Prospects and challenges for multi-relational data miningACM SIGKDD Explorations Newsletter, 2003
- Link miningACM SIGKDD Explorations Newsletter, 2003
- Statistical mechanics of complex networksReviews of Modern Physics, 2002
- Graph-based data miningIEEE Intelligent Systems and their Applications, 2000
- Models of core/periphery structuresSocial Networks, 2000
- On power-law relationships of the Internet topologyACM SIGCOMM Computer Communication Review, 1999
- The nature of the social agent*The Journal of Mathematical Sociology, 1994
- Statistical Analysis of Multiple Sociometric RelationsJournal of the American Statistical Association, 1985