On-Line Approximate String Matching with Bounded Errors
- 7 June 2008
- book chapter
- conference paper
- Published by Springer Science and Business Media LLC
Abstract
We introduce a new dimension to the widely studied on-line approximate string matching problem, by introducing an error threshold parameter ε so that the algorithm is allowed to miss occurrences with probability ε. This is particularly appropriate for this problem, as approximate searching is used to model many cases where exact answers are not mandatory. We show that the relaxed version of the problem allows us breaking the average-case optimal lower bound of the classical problem, achieving average case O(nlog σ m/m) time with any \(\epsilon = \textrm{poly}(k/m)\), where n is the text size, m the pattern length, k the number of errors for edit distance, and σ the alphabet size. Our experimental results show the practicality of this novel and promising research direction.
Keywords
This publication has 7 references indexed in Scilit:
- Average-optimal single and multiple approximate string matchingACM Journal of Experimental Algorithmics, 2004
- Large deviations for sums of partly dependent random variablesRandom Structures & Algorithms, 2004
- Algorithms columnACM SIGACT News, 2003
- A guided tour to approximate string matchingACM Computing Surveys, 2001
- Sublinear approximate string matching and biological applicationsAlgorithmica, 1994
- Approximate string matching and local similarityLecture Notes in Computer Science, 1994
- Approximate string-matching with q-grams and maximal matchesTheoretical Computer Science, 1992