Building a distributed full-text index for the web

1 July 2001

journal article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Information Systems

Vol. 19 (3), 217-241
https://doi.org/10.1145/502115.502116

Abstract

We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time. We also propose a storage scheme for creating and managing inverted files using an embedded database system. We suggest and compare different strategies for collecting global statistics from distributed inverted indexes. Finally, we present performance results from experiments on a testbed distributed Web indexing system that we have implemented.

This publication has 13 references indexed in Scilit:

WebBase: a repository of Web pages
Computer Networks, 2000
Accessibility of information on the web
Nature, 1999
Query performance for tightly coupled distributed digital libraries
Published by Association for Computing Machinery (ACM) ,1998
STARTS
Published by Association for Computing Machinery (ACM) ,1997
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems, 1996
Supporting full-text information retrieval with a persistent object store
Lecture Notes in Computer Science, 1994
Query processing and inverted indices in shared-nothing text document information retrieval systems
The VLDB Journal, 1993
Structuring Text within a Relational System
Published by Springer Science and Business Media LLC ,1992
An extended relational document retrieval model
Information Processing & Management, 1988
Signature files
ACM Transactions on Information Systems, 1984

Cited by 52 articles