Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning

Abstract

Object category localization is a challenging problem in computer vision. Standard supervised training requires bounding box annotations of object instances. This time-consuming annotation process is sidestepped in weakly supervised learning. In this case, the supervised information is restricted to binary labels that indicate the absence/presence of object instances in the image, without their locations. We follow a multiple-instance learning approach that iteratively trains the detector and infers the object locations in the positive training images. Our main contribution is a multi-fold multiple instance learning procedure, which prevents training from prematurely locking onto erroneous object locations. This procedure is particularly important when using high-dimensional representations, such as Fisher vectors and convolutional neural network features. We also propose a window refinement method, which improves the localization accuracy by incorporating an objectness prior. We present a detailed experimental evaluation using the PASCAL VOC 2007 dataset, which verifies the effectiveness of our approach.

Keywords

Other Versions

Funding Information

AXES
ERC
ALLEGRO
LEAR team
Inria Grenoble Rhône-Alpes
Laboratoire Jean Kuntzmann
CNRS
University Grenoble Alpes, France

This publication has 43 references indexed in Scilit:

Object and Action Classification with Latent Window Parameters
International Journal of Computer Vision, 2013
Image Classification with the Fisher Vector: Theory and Practice
International Journal of Computer Vision, 2013
Selective Search for Object Recognition
International Journal of Computer Vision, 2013
Weakly Supervised Localization and Learning with Generic Knowledge
International Journal of Computer Vision, 2012
Multi-component Models for Object Detection
Lecture Notes in Computer Science, 2012
Object-Centric Spatial Pooling for Image Classification
Lecture Notes in Computer Science, 2012
The Pascal Visual Object Classes (VOC) Challenge
International Journal of Computer Vision, 2009
Taking the bite out of automated naming of characters in TV video
Image and Vision Computing, 2009
Smooth minimization of non-smooth functions
Mathematical Programming, 2004
Solving the multiple instance problem with axis-parallel rectangles
Artificial Intelligence, 1997

Cited by 197 articles