The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes

1 June 2016

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 3234-3243
https://doi.org/10.1109/cvpr.2016.352

Abstract

Vision-based semantic segmentation in urban scenarios is a key functionality for autonomous driving. Recent revolutionary results of deep convolutional neural networks (DCNNs) foreshadow the advent of reliable classifiers to perform such visual tasks. However, DCNNs require learning of many parameters from raw images, thus, having a sufficient amount of diverse images with class annotations is needed. These annotations are obtained via cumbersome, human labour which is particularly challenging for semantic segmentation since pixel-level annotations are required. In this paper, we propose to use a virtual world to automatically generate realistic synthetic images with pixel-level annotations. Then, we address the question of how useful such data can be for semantic segmentation - in particular, when using a DCNN paradigm. In order to answer this question we have generated a synthetic collection of diverse urban images, named SYNTHIA, with automatically generated class annotations. We use SYNTHIA in combination with publicly available real-world urban images with manually provided annotations. Then, we conduct experiments with DCNNs that show how the inclusion of SYNTHIA in the training stage significantly improves performance on the semantic segmentation task.

Keywords

This publication has 20 references indexed in Scilit:

Learning Deep Object Detectors from 3D Models
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Learning scene-specific pedestrian detectors without real data
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
ImageNet Large Scale Visual Recognition Challenge
International Journal of Computer Vision, 2015
Large-Scale Video Classification with Convolutional Neural Networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Articulated people detection and pose estimation: Reshaping the future
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Real-time human pose recognition in parts from single depth images
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
It's All About the Data
Proceedings of the IEEE, 2010
Semantic object classes in video: A high-definition ground truth database
Pattern Recognition Letters, 2009
LabelMe: A Database and Web-Based Tool for Image Annotation
International Journal of Computer Vision, 2007

Cited by 1342 articles