Better external memory suffix array construction
- 1 June 2008
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Journal of Experimental Algorithmics
- Vol. 12, 1-24
- https://doi.org/10.1145/1227161.1402296
Abstract
Suffix arrays are a simple and powerful data structure for text processing that can be used for full text indexes, data compression, and many other applications, in particular, in bioinformatics. However, so far, it has appeared prohibitive to build suffix arrays for huge inputs that do not fit into main memory. This paper presents design, analysis, implementation, and experimental evaluation of several new and improved algorithms for suffix array construction. The algorithms are asymptotically optimal in the worst case or on average. Our implementation can construct suffix arrays for inputs of up to 4-GB in hours on a low-cost machine. As a tool of possible independent interest, we present a systematic way to design, analyze, and implement pipelined algorithms.Keywords
This publication has 10 references indexed in Scilit:
- Linear work suffix array constructionJournal of the ACM, 2006
- Scalable Parallel Suffix Array ConstructionLecture Notes in Computer Science, 2006
- Stxxl: Standard Template Library for XXL Data SetsLecture Notes in Computer Science, 2005
- Asynchronous parallel disk sortingPublished by Association for Computing Machinery (ACM) ,2003
- A Theoretical and Experimental Study on the Construction of Suffix Arrays in External MemoryAlgorithmica, 2002
- On the sorting-complexity of suffix tree constructionJournal of the ACM, 2000
- Using MPI-2Published by MIT Press ,1999
- On sorting strings in external memory (extended abstract)Published by Association for Computing Machinery (ACM) ,1997
- Suffix Arrays: A New Method for On-Line String SearchesSIAM Journal on Computing, 1993
- The input/output complexity of sorting and related problemsCommunications of the ACM, 1988