Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning
- 26 February 2016
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Pattern Analysis and Machine Intelligence
- Vol. 39 (1), 189-203
- https://doi.org/10.1109/tpami.2016.2535231
Abstract
Object category localization is a challenging problem in computer vision. Standard supervised training requires bounding box annotations of object instances. This time-consuming annotation process is sidestepped in weakly supervised learning. In this case, the supervised information is restricted to binary labels that indicate the absence/presence of object instances in the image, without their locations. We follow a multiple-instance learning approach that iteratively trains the detector and infers the object locations in the positive training images. Our main contribution is a multi-fold multiple instance learning procedure, which prevents training from prematurely locking onto erroneous object locations. This procedure is particularly important when using high-dimensional representations, such as Fisher vectors and convolutional neural network features. We also propose a window refinement method, which improves the localization accuracy by incorporating an objectness prior. We present a detailed experimental evaluation using the PASCAL VOC 2007 dataset, which verifies the effectiveness of our approach.Keywords
Other Versions
Funding Information
- AXES
- ERC
- ALLEGRO
- LEAR team
- Inria Grenoble Rhône-Alpes
- Laboratoire Jean Kuntzmann
- CNRS
- University Grenoble Alpes, France
This publication has 43 references indexed in Scilit:
- Object and Action Classification with Latent Window ParametersInternational Journal of Computer Vision, 2013
- Image Classification with the Fisher Vector: Theory and PracticeInternational Journal of Computer Vision, 2013
- Selective Search for Object RecognitionInternational Journal of Computer Vision, 2013
- Weakly Supervised Localization and Learning with Generic KnowledgeInternational Journal of Computer Vision, 2012
- Multi-component Models for Object DetectionLecture Notes in Computer Science, 2012
- Object-Centric Spatial Pooling for Image ClassificationLecture Notes in Computer Science, 2012
- The Pascal Visual Object Classes (VOC) ChallengeInternational Journal of Computer Vision, 2009
- Taking the bite out of automated naming of characters in TV videoImage and Vision Computing, 2009
- Smooth minimization of non-smooth functionsMathematical Programming, 2004
- Solving the multiple instance problem with axis-parallel rectanglesArtificial Intelligence, 1997