Robust 3D Hand Pose Estimation in Single Depth Images: from Single-View CNN to Multi-View CNNs

Preprint
Abstract
Articulated hand pose estimation plays an important role in human-computer interaction. Despite the recent progress, the accuracy of existing methods is still not satisfactory, partially due to the difficulty of embedded high-dimensional and non-linear regression problem. Most existing discriminative methods regress the hand pose directly from a single depth image, which cannot fully utilize the depth information. In this paper, we propose a novel multi-view CNNs based approach for 3D hand pose estimation. The query depth image is projected onto multiple planes, and multi-view CNNs are trained to learn the mapping from projected images to 2D heat-maps which estimate 2D joint positions on each plane. These multi-view heat-maps are then fused to produce final 3D hand pose estimation with learned pose priors. Experimental results show that the proposed method is superior than several state-of-the-art methods on two challenging datasets. Moreover, a quantitative cross-dataset experiment and a qualitative experiment also demonstrate the good generalization ability of the proposed method.