Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
- 1 June 2014
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2014 IEEE Conference on Computer Vision and Pattern Recognition
- p. 580-587
- https://doi.org/10.1109/cvpr.2014.81
Abstract
Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also present experiments that provide insight into what the network learns, revealing a rich hierarchy of image features. Source code for the complete system is available at http://www.cs.berkeley.edu/~rbg/rcnn.Keywords
Other Versions
This publication has 23 references indexed in Scilit:
- Bottom-Up Segmentation for Top-Down DetectionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Sketch Tokens: A Learned Mid-level Representation for Contour and Object DetectionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Selective Search for Object RecognitionInternational Journal of Computer Vision, 2013
- The Pascal Visual Object Classes (VOC) ChallengeInternational Journal of Computer Vision, 2009
- Distinctive Image Features from Scale-Invariant KeypointsInternational Journal of Computer Vision, 2004
- Neural network-based face detectionIEEE Transactions on Pattern Analysis and Machine Intelligence, 1998
- Gradient-based learning applied to document recognitionProceedings of the IEEE, 1998
- Original approach for the localisation of objects in imagesIEE Proceedings - Vision, Image, and Signal Processing, 1994
- Backpropagation Applied to Handwritten Zip Code RecognitionNeural Computation, 1989
- Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in positionBiological Cybernetics, 1980