Probabilistic discovery of time series motifs
- 24 August 2003
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 493-498
- https://doi.org/10.1145/956750.956808
Abstract
Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs. Two limitations of this work were the poor scalability of the motif discovery algorithm, and the inability to discover motifs in the presence of noise.Here we address these limitations by introducing a novel algorithm inspired by recent advances in the problem of pattern discovery in biosequences. Our algorithm is probabilistic in nature, but as we show empirically and theoretically, it can find time series motifs with very high probability even in the presence of noise or "don't care" symbols. Not only is the algorithm fast, but it is an anytime algorithm, producing likely candidate motifs almost immediately, and gradually improving the quality of results over time.Keywords
This publication has 17 references indexed in Scilit:
- Efficient large-scale sequence comparison by locality-sensitive hashingBioinformatics, 2001
- Finding motifs using random projectionsPublished by Association for Computing Machinery (ACM) ,2001
- An Updated Bibliography of Temporal, Spatial, and Spatio-temporal Data Mining ResearchLecture Notes in Computer Science, 2001
- Probabilistic and Statistical Properties of Words: An OverviewJournal of Computational Biology, 2000
- Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequenciesJournal of Molecular Biology, 1998
- Studies in Astronomical Time Series Analysis. V. Bayesian Blocks, a New Method to Analyze Structure in Photon Counting DataThe Astrophysical Journal, 1998
- Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm.Bioinformatics, 1998
- Unsupervised learning of multiple motifs in biopolymers using expectation maximizationMachine Learning, 1995
- Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple AlignmentScience, 1993
- An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequencesProteins-Structure Function and Bioinformatics, 1990