YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video
- 1 July 2017
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 10636919,p. 7464-7473
- https://doi.org/10.1109/cvpr.2017.789
Abstract
We introduce a new large-scale data set of video URLs with densely-sampled object bounding box annotations called YouTube-BoundingBoxes (YT-BB). The data set consists of approximately 380,000 video segments about 19s long, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality often akin to that of a hand-held cell phone camera. The objects represent a subset of the COCO [32] label set. All video segments were human-annotated with high-precision classification labels and bounding boxes at 1 frame per second. The use of a cascade of increasingly precise human annotations ensures a label accuracy above 95% for every class and tight bounding boxes. Finally, we train and evaluate well-known deep network architectures and report baseline figures for per-frame classification and localization. We also demonstrate how the temporal contiguity of video can potentially be used to improve such inferences. The data set can be found at https://research.google.com/youtube-bb. We hope the availability of such large curated corpus will spur new advances in video object detection and tracking.Keywords
Other Versions
This publication has 25 references indexed in Scilit:
- SoylentCommunications of the ACM, 2015
- ImageNet Large Scale Visual Recognition ChallengeInternational Journal of Computer Vision, 2015
- CaffePublished by Association for Computing Machinery (ACM) ,2014
- The Pascal Visual Object Classes Challenge: A RetrospectiveInternational Journal of Computer Vision, 2014
- Keep it simplePublished by Association for Computing Machinery (ACM) ,2013
- Pay by the bitPublished by Association for Computing Machinery (ACM) ,2013
- Amazon's Mechanical TurkPerspectives on Psychological Science, 2011
- The Pascal Visual Object Classes (VOC) ChallengeInternational Journal of Computer Vision, 2009
- Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categoriesComputer Vision and Image Understanding, 2007
- Long Short-Term MemoryNeural Computation, 1997