Anticipating Visual Representations from Unlabeled Video

Top Cited Papers

Open Access

1 June 2016

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 98-106
https://doi.org/10.1109/cvpr.2016.18

Abstract

Anticipating actions and objects before they start or appear is a difficult problem in computer vision with several real-world applications. This task is challenging partly because it requires leveraging extensive knowledge of the world that is difficult to write down. We believe that a promising resource for efficiently learning this knowledge is through readily available unlabeled video. We present a framework that capitalizes on temporal structure in unlabeled video to learn to anticipate human actions and objects. The key idea behind our approach is that we can train deep networks to predict the visual representation of images in the future. Visual representations are a promising prediction target because they encode images at a higher semantic level than pixels yet are automatic to compute. We then apply recognition algorithms on our predicted representation to anticipate objects and actions. We experimentally validate this idea on two datasets, anticipating actions one second in the future and objects five seconds in the future.

Keywords

Other Versions

Version 2, 2015-04-29, preprints

This publication has 27 references indexed in Scilit:

Dense Optical Flow Prediction from a Static Image
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Caffe
Published by Association for Computing Machinery (ACM) ,2014
Patch to the Future: Unsupervised Visual Prediction
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
CNN Features Off-the-Shelf: An Astounding Baseline for Recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Max-Margin Early Event Detectors
International Journal of Computer Vision, 2013
Action bank: A high-level representation of activity in video
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Human activity prediction: Early recognition of ongoing activities from streaming videos
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
Learning actions from the Web
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008
Leveraging archival video for building face datasets
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007

Cited by 352 articles