Mining Recurring Concept Drifts with Limited Labeled Streaming Data
- 1 February 2012
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Intelligent Systems and Technology
- Vol. 3 (2), 1-32
- https://doi.org/10.1145/2089094.2089105
Abstract
Tracking recurring concept drifts is a significant issue for machine learning and data mining that frequently appears in real-world stream classification problems. It is a challenge for many streaming classification algorithms to learn recurring concepts in a data stream environment with unlabeled data, and this challenge has received little attention from the research community. Motivated by this challenge, this article focuses on the problem of recurring contexts in streaming environments with limited labeled data. We propose a semi-supervised classification algorithm for data streams with REcurring concept Drifts and Limited LAbeled data, called REDLLA, in which a decision tree is adopted as the classification model. When growing a tree, a clustering algorithm based on k -means is installed to produce concept clusters and unlabeled data are labeled in the method of majority-class at leaves. In view of deviations between history and new concept clusters, potential concept drifts are distinguished and recurring concepts are maintained. Extensive studies on both synthetic and real-world data confirm the advantages of our REDLLA algorithm over three state-of-the-art online classification algorithms of CVFDT, DWCDS, and CDRDT and several known online semi-supervised algorithms, even in the case with more than 90% unlabeled data.Keywords
Funding Information
- Chinese Academy of Sciences (2010HGXJ-0715)
- National Natural Science Foundation of China (6.08E+15)
- Ministry of Science and Technology of the People's Republic of China (2009CB326203)
- Fundamental Research Funds for the Central Universities of China (2011HGZY0003)
This publication has 27 references indexed in Scilit:
- A RANDOM DECISION TREE ENSEMBLE FOR MINING CONCEPT DRIFTS FROM NOISY DATA STREAMSApplied Artificial Intelligence, 2010
- Semi-supervised learning by disagreementKnowledge and Information Systems, 2009
- Tracking recurring contexts using ensemble classifiers: an application to email filteringKnowledge and Information Systems, 2009
- Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming DataLecture Notes in Computer Science, 2009
- Improving the Performance of Data Stream Classifiers by Mining Recurring ContextsLecture Notes in Computer Science, 2006
- ACE: Adaptive Classifiers-Ensemble System for Concept-Drifting EnvironmentsLecture Notes in Computer Science, 2005
- Semi-Supervised Learning on Riemannian ManifoldsMachine Learning, 2004
- A streaming ensemble algorithm (SEA) for large-scale classificationPublished by Association for Computing Machinery (ACM) ,2001
- Mining high-speed data streamsPublished by Association for Computing Machinery (ACM) ,2000
- Probability Inequalities for Sums of Bounded Random VariablesJournal of the American Statistical Association, 1963