HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

1 June 2020

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 5385-5394
https://doi.org/10.1109/cvpr42600.2020.00543

Abstract

Bottom-up human pose estimation methods have difficulties in predicting the correct pose for small persons due to challenges in scale variation. In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids. Equipped with multi-resolution supervision for training and multi-resolution aggregation for inference, the proposed approach is able to solve the scale variation challenge in bottom-up multi-person pose estimation and localize keypoints more precisely, especially for small person. The feature pyramid in HigherHRNet consists of feature map outputs from HRNet and upsampled higher-resolution outputs through a transposed convolution. HigherHRNet outperforms the previous best bottom-up method by 2.5% AP for medium person on COCO test-dev, showing its effectiveness in handling scale variation. Furthermore, HigherHRNet achieves new state-of-the-art result on COCO test-dev (70.5% AP) without using refinement or other post-processing techniques, surpassing all existing bottom-up methods. HigherHRNet even surpasses all top-down methods on CrowdPose test (67.6% AP), suggesting its robustness in crowded scene.

Keywords

This publication has 22 references indexed in Scilit:

Towards Accurate Multi-person Pose Estimation in the Wild
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2017
Feature Pyramid Networks for Object Detection
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2017
Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2017
RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2017
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017
Multi-person Pose Estimation with Local Joint-to-Person Associations
Lecture Notes in Computer Science, 2016
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016
Deep Residual Learning for Image Recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016

Cited by 468 articles