Unsupervised Learning of Object Features from Video Sequences

Abstract

We develop an efficient algorithm for unsupervised learning of object models as constellations of features, from low resolution video sequences. The input images typically contain single or multiple objects that change in pose, scale and degree of occlusion. Also, the objects can move significantly between consecutive frames. The content of an input sequence is unlabeled so the learner has to cluster the data based on the data's implicit coherence over time and space. Our approach takes advantage of the dependent pairwise co-occurrences of objects' features within local neighborhoods vs. the independent behavior of unrelated features. We couple or decouple pairs of features based on a probabilistic interpretation of their pairwise statistics and then extract objects as connected components of features.

Keywords

This publication has 6 references indexed in Scilit:

Segmenting, modeling, and matching video clips containing multiple moving objects
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Tractable group detection on large link data sets
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Object class recognition by unsupervised scale-invariant learning
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Using temporal coherence to build models of animals
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Affine-invariant local descriptors and neighborhood statistics for texture recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Object recognition from local scale-invariant features
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1999

Cited by 16 articles