Deep Learning Based Object Recognition Using Physically-Realistic Synthetic Depth Scenes

Open Access

6 August 2019

journal article
research article
Published by MDPI AG in Machine Learning and Knowledge Extraction

Vol. 1 (3), 883-903
https://doi.org/10.3390/make1030051

Abstract

Recognizing objects and estimating their poses have a wide range of application in robotics. For instance, to grasp objects, robots need the position and orientation of objects in 3D. The task becomes challenging in a cluttered environment with different types of objects. A popular approach to tackle this problem is to utilize a deep neural network for object recognition. However, deep learning-based object detection in cluttered environments requires a substantial amount of data. Collection of these data requires time and extensive human labor for manual labeling. In this study, our objective was the development and validation of a deep object recognition framework using a synthetic depth image dataset. We synthetically generated a depth image dataset of 22 objects randomly placed in a 0.5 m × 0.5 m × 0.1 m box, and automatically labeled all objects with an occlusion rate below 70%. Faster Region Convolutional Neural Network (R-CNN) architecture was adopted for training using a dataset of 800,000 synthetic depth images, and its performance was tested on a real-world depth image dataset consisting of 2000 samples. Deep object recognizer has 40.96% detection accuracy on the real depth images and 93.5% on the synthetic depth images. Training the deep learning model with noise-added synthetic images improves the recognition accuracy for real images to 46.3%. The object detection framework can be trained on synthetically generated depth data, and then employed for object recognition on the real depth data in a cluttered environment. Synthetic depth data-based deep object detection has the potential to substantially decrease the time and human effort required for the extensive data collection and labeling.

Keywords

This publication has 30 references indexed in Scilit:

Aligning 3D models to RGB-D images of cluttered scenes
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Going deeper with convolutions
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Deep learning
Nature, 2015
Locomotion Strategy Selection for a Hybrid Mobile Robot Using Time of Flight Depth Sensor
Journal of Sensors, 2015
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
The Pascal Visual Object Classes (VOC) Challenge
International Journal of Computer Vision, 2009
Depth-image-based rendering for 3DTV service over T-DMB
Signal Processing: Image Communication, 2009
Stereoscopic Image Generation Based on Depth Images for 3D TV
IEEE Transactions on Broadcasting, 2005
Infrared image processing and data analysis
Infrared Physics & Technology, 2004
Complex wavelets for extended depth‐of‐field: A new method for the fusion of multichannel microscopy images
Microscopy Research and Technique, 2004

Cited by 8 articles