Efficient Motif Discovery for Large-Scale Time Series in Healthcare
- 9 March 2015
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Industrial Informatics
- Vol. 11 (3), 583-590
- https://doi.org/10.1109/tii.2015.2411226
Abstract
Analyzing time series data can reveal the temporal behavior of the underlying mechanism producing the data. Time series motifs, which are similar subsequences or frequently occurring patterns, have significant meanings for researchers especially in medical domain. With the fast growth of time series data, traditional methods for motif discovery are inefficient and not applicable to large-scale data. This work proposes an efficient Motif Discovery method for Large-scale time series (MDLats). By computing standard motifs, MDLats eliminates a majority of redundant computation in the related arts and reuses existing information to the maximum. All the motif types and subsequences are generated for subsequent analysis and classification. Our system is implemented on a Hadoop platform and deployed in a hospital for clinical electrocardiography classification. The experiments on real-world healthcare data show that MDLats outperform the state-of-the-art methods even in large time series.Keywords
This publication has 13 references indexed in Scilit:
- Toolkit-Based High-Performance Data Mining of Large Data on MapReduce ClustersPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Exact Discovery of Time Series MotifsPublished by Society for Industrial & Applied Mathematics (SIAM) ,2009
- Discovering original motifs with different lengths from time seriesKnowledge-Based Systems, 2008
- MapReduceCommunications of the ACM, 2008
- Detecting time series motifs under uniform scalingPublished by Association for Computing Machinery (ACM) ,2007
- Discovery of Time-Series Motif from Multi-Dimensional Data Based on MDL PrincipleMachine Learning, 2005
- Locating Motifs in Time-Series DataLecture Notes in Computer Science, 2005
- Probabilistic discovery of time series motifsPublished by Association for Computing Machinery (ACM) ,2003
- A symbolic representation of time series, with implications for streaming algorithmsPublished by Association for Computing Machinery (ACM) ,2003
- Finding Motifs Using Random ProjectionsJournal of Computational Biology, 2002