DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving

1 December 2015

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 2722-2730
https://doi.org/10.1109/iccv.2015.312

Abstract

Today, there are two major paradigms for vision-based autonomous driving systems: mediated perception approaches that parse an entire scene to make a driving decision, and behavior reflex approaches that directly map an input image to a driving action by a regressor. In this paper, we propose a third paradigm: a direct perception approach to estimate the affordance for driving. We propose to map an input image to a small number of key perception indicators that directly relate to the affordance of a road/traffic state for driving. Our representation provides a set of compact yet complete descriptions of the scene to enable a simple controller to drive autonomously. Falling in between the two extremes of mediated perception and behavior reflex, we argue that our direct perception representation provides the right level of abstraction. To demonstrate this, we train a deep Convolutional Neural Network using recording from 12 hours of human driving in a video game and show that our model can work well to drive a car in a very diverse set of virtual environments. We also train a model for car distance estimation on the KITTI dataset. Results show that our direct perception approach can generalize well to real driving images. Source code and data are available on our project website.

Keywords

Other Versions

This publication has 15 references indexed in Scilit:

Caffe
Published by Association for Computing Machinery (ACM) ,2014
Scalable Object Detection Using Deep Neural Networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
3D Traffic Scene Understanding From Movable Platforms
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013
Vision meets robotics: The KITTI dataset
The International Journal of Robotics Research, 2013
Sparse scene flow segmentation for moving object detection in urban environments
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
Object Detection with Discriminatively Trained Part-Based Models
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
International Journal of Computer Vision, 2001
Against direct perception
Behavioral and Brain Sciences, 1980
Nonlinear Effects in the Dynamics of Car Following
Operations Research, 1961

Cited by 1097 articles