Unsupervised Joint Feature Learning and Encoding for RGB-D Scene Labeling

11 August 2015

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Image Processing

Vol. 24 (11), 4459-4473
https://doi.org/10.1109/tip.2015.2465133

Abstract

Most existing approaches for RGB-D indoor scene labeling employ hand-crafted features for each modality independently and combine them in a heuristic manner. There has been some attempt on directly learning features from raw RGB-D data, but the performance is not satisfactory. In this paper, we propose an unsupervised joint feature learning and encoding (JFLE) framework for RGB-D scene labeling. The main novelty of our learning framework lies in the joint optimization of feature learning and feature encoding in a coherent way, which significantly boosts the performance. By stacking basic learning structure, higher level features are derived and combined with lower level features for better representing RGB-D data. Moreover, to explore the nonlinear intrinsic characteristic of data, we further propose a more general joint deep feature learning and encoding (JDFLE) framework that introduces the nonlinear mapping into JFLE. The experimental results on the benchmark NYU depth dataset show that our approaches achieve competitive performance, compared with the state-of-the-art methods, while our methods do not need complex feature handcrafting and feature combination and can be easily applied to other data sets.

Keywords

Funding Information

Singapore National Research Foundation under its International Research Centre at the Singapore Funding Initiative, and administered by the Interactive Digital Medi Programme Office
Ministry of Education (MOE) Tier 1 (RG 138/14)
Singapore MOE Tier 2 (ARC28/14)
Agency for Science, Technology and Research through the Science and Engineering Research Council, Singapore (PSF1321202099)

This publication has 19 references indexed in Scilit:

Dense 3D semantic mapping of indoor scenes from RGB-D images
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Learning depth-sensitive conditional random fields for semantic segmentation of RGB-D images
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Multi-modal Unsupervised Feature Learning for RGB-D Scene Labeling
Lecture Notes in Computer Science, 2014
Unsupervised multimodal feature learning for semantic image segmentation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Indoor scene segmentation using a structured light sensor
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
Semantic segmentation of street scenes by superpixel co-occurrence and 3D geometry
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
A Fast Learning Algorithm for Deep Belief Nets
Neural Computation, 2006
Histograms of Oriented Gradients for Human Detection
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Object recognition from local scale-invariant features
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1999

Cited by 23 articles