Region-Based Convolutional Networks for Accurate Object Detection and Segmentation

Top Cited Papers

Abstract

Object detection performance, as measured on the canonical PASCAL VOC Challenge datasets, plateaued in the final years of the competition. The best-performing methods were complex ensemble systems that typically combined multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 50 percent relative to the previous best result on VOC 2012-achieving a mAP of 62.4 percent. Our approach combines two ideas: (1) one can apply high-capacity convolutional networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data are scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, boosts performance significantly. Since we combine region proposals with CNNs, we call the resulting model an R-CNN or Region-based Convolutional Network. Source code for the complete system is available at http://www.cs.berkeley.edu/~rbg/rcnn.

Keywords

Funding Information

US National Science Foundation (IIS-0905647, IIS-1134072, IIS-1212798, MURI N000014-10-1-0933)

This publication has 37 references indexed in Scilit:

What Makes for Effective Detection Proposals?
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015
Selective Search for Object Recognition
International Journal of Computer Vision, 2013
Representation Learning: A Review and New Perspectives
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013
The Pascal Visual Object Classes (VOC) Challenge
International Journal of Computer Vision, 2009
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision, 2004
Neural network-based face detection
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998
Gradient-based learning applied to document recognition
Proceedings of the IEEE, 1998
Original approach for the localisation of objects in images
IEE Proceedings - Vision, Image, and Signal Processing, 1994
Multitask Learning: A Knowledge-Based Source of Inductive Bias
Published by Elsevier BV ,1993
Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position
Biological Cybernetics, 1980

Cited by 1833 articles