COMPOSE: A Semisupervised Learning Framework for Initially Labeled Nonstationary Streaming Data

Abstract

An increasing number of real-world applications are associated with streaming data drawn from drifting and nonstationary distributions that change over time. These applications demand new algorithms that can learn and adapt to such changes, also known as concept drift. Proper characterization of such data with existing approaches typically requires substantial amount of labeled instances, which may be difficult, expensive, or even impractical to obtain. In this paper, we introduce compacted object sample extraction (COMPOSE), a computational geometry-based framework to learn from nonstationary streaming data, where labels are unavailable (or presented very sporadically) after initialization. We introduce the algorithm in detail, and discuss its results and performances on several synthetic and real-world data sets, which demonstrate the ability of the algorithm to learn under several different scenarios of initially labeled streaming environments. On carefully designed synthetic data sets, we compare the performance of COMPOSE against the optimal Bayes classifier, as well as the arbitrary subpopulation tracker algorithm, which addresses a similar environment referred to as extreme verification latency. Furthermore, using the real-world National Oceanic and Atmospheric Administration weather data set, we demonstrate that COMPOSE is competitive even with a well-established and fully supervised nonstationary learning algorithm that receives labeled data in every batch.

Keywords

This publication has 38 references indexed in Scilit:

Drift mining in data: A framework for addressing drift in classification
Computational Statistics & Data Analysis, 2013
Class and subclass probability re-estimation to adapt a classifier in the presence of concept drift
Neurocomputing, 2011
A theory of learning from different domains
Machine Learning, 2009
Tracking recurring contexts using ensemble classifiers: an application to email filtering
Knowledge and Information Systems, 2009
Dynamic integration of classifiers for handling concept drift
Information Fusion, 2008
Improving predictive inference under covariate shift by weighting the log-likelihood function
Journal of Statistical Planning and Inference, 2000
The quickhull algorithm for convex hulls
ACM Transactions on Mathematical Software, 1996
Learning in the presence of concept drift and hidden contexts
Machine Learning, 1996
Three-dimensional alpha shapes
ACM Transactions on Graphics, 1994
Sample Selection Bias as a Specification Error
Econometrica, 1979

Cited by 106 articles