Learning Image Representations Tied to Ego-Motion
- 1 December 2015
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2015 IEEE International Conference on Computer Vision (ICCV)
- p. 1413-1421
- https://doi.org/10.1109/iccv.2015.166
Abstract
Understanding how images of objects and scenes behave in response to specific ego-motions is a crucial aspect of proper visual development, yet existing visual learning methods are conspicuously disconnected from the physical source of their images. We propose to exploit proprioceptive motor signals to provide unsupervised regularization in convolutional neural networks to learn visual representations from egocentric video. Specifically, we enforce that our learned features exhibit equivariance, i.e, they respond predictably to transformations associated with distinct ego-motions. With three datasets, we show that our unsupervised feature learning approach significantly outperforms previous approaches on visual recognition and next-best-view prediction tasks. In the most challenging test, we show that features learned from video captured on an autonomous driving platform improve large-scale scene recognition in static images from a disjoint domain.Keywords
Other Versions
This publication has 24 references indexed in Scilit:
- Slowness and Sparseness Have Diverging Effects on Complex Cell LearningPLoS Computational Biology, 2014
- Vision meets robotics: The KITTI datasetThe International Journal of Robotics Research, 2013
- Learning to Relate ImagesIEEE Transactions on Pattern Analysis and Machine Intelligence, 2013
- Learning Intermediate-Level Representations of Form and Motion from Natural MoviesNeural Computation, 2012
- Moving Object Segmentation Using Motor SignalsLecture Notes in Computer Science, 2012
- Transforming Auto-EncodersLecture Notes in Computer Science, 2011
- Transformation Equivariant Boltzmann MachinesLecture Notes in Computer Science, 2011
- Local Invariant Feature Detectors: A SurveyFoundations and Trends® in Computer Graphics and Vision, 2007
- SIGNATURE VERIFICATION USING A “SIAMESE” TIME DELAY NEURAL NETWORKInternational Journal of Pattern Recognition and Artificial Intelligence, 1993
- Movement-produced stimulation in the development of visually guided behavior.Journal of Comparative and Physiological Psychology, 1963