Building a distributed full-text index for the web
- 1 July 2001
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Information Systems
- Vol. 19 (3), 217-241
- https://doi.org/10.1145/502115.502116
Abstract
We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time. We also propose a storage scheme for creating and managing inverted files using an embedded database system. We suggest and compare different strategies for collecting global statistics from distributed inverted indexes. Finally, we present performance results from experiments on a testbed distributed Web indexing system that we have implemented.This publication has 13 references indexed in Scilit:
- WebBase: a repository of Web pagesComputer Networks, 2000
- Accessibility of information on the webNature, 1999
- Query performance for tightly coupled distributed digital librariesPublished by Association for Computing Machinery (ACM) ,1998
- STARTSPublished by Association for Computing Machinery (ACM) ,1997
- Self-indexing inverted files for fast text retrievalACM Transactions on Information Systems, 1996
- Supporting full-text information retrieval with a persistent object storeLecture Notes in Computer Science, 1994
- Query processing and inverted indices in shared-nothing text document information retrieval systemsThe VLDB Journal, 1993
- Structuring Text within a Relational SystemPublished by Springer Science and Business Media LLC ,1992
- An extended relational document retrieval modelInformation Processing & Management, 1988
- Signature filesACM Transactions on Information Systems, 1984