Automated Parts-Based Model for Recognizing Human–Object Interactions from Aerial Imagery with Fully Convolutional Network

Open Access

19 March 2022

journal article
research article
Published by MDPI AG in Remote Sensing

Vol. 14 (6), 1492
https://doi.org/10.3390/rs14061492

Abstract

Advanced aerial images have led to the development of improved human–object interaction recognition (HOI) methods for usage in surveillance, security, and public monitoring systems. Despite the ever-increasing rate of research being conducted in the field of HOI, the existing challenges of occlusion, scale variation, fast motion, and illumination variation continue to attract more researchers. In particular, accurate identification of human body parts, the involved objects, and robust features is the key to effective HOI recognition systems. However, identifying different human body parts and extracting their features is a tedious and rather ineffective task. Based on the assumption that only a few body parts are usually involved in a particular interaction, this article proposes a novel parts-based model for recognizing complex human–object interactions in videos and images captured using ground and aerial cameras. Gamma correction and non-local means denoising techniques have been used for pre-processing the video frames and Felzenszwalb’s algorithm has been utilized for image segmentation. After segmentation, twelve human body parts have been detected and five of them have been shortlisted based on their involvement in the interactions. Four kinds of features have been extracted and concatenated into a large feature vector, which has been optimized using the t-distributed stochastic neighbor embedding (t-SNE) technique. Finally, the interactions have been classified using a fully convolutional network (FCN). The proposed system has been validated on the ground and aerial videos of the VIRAT Video, YouTube Aerial, and SYSU 3D HOI datasets, achieving average accuracies of 82.55%, 86.63%, and 91.68% on these datasets, respectively.

This publication has 40 references indexed in Scilit:

Contextual Action Recognition with R* CNN
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
A spatiotemporal motion variation features extraction approach for human tracking and pose-based action recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Discovering human interactions in videos with limited data labeling
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Real-time life logging via a depth silhouette-based human activity recognition system for smart home services
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Human Activity Recognition via Recognized Body Parts of Human Depth Silhouettes for Residents Monitoring Services at Smart Home
Indoor and Built Environment, 2012
Human Body Parts Tracking Using Torso Tracking: Applications to Activity Recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2012
Image texture classification using textons
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
A large-scale benchmark dataset for event recognition in surveillance video
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
Efficient Graph-Based Image Segmentation
International Journal of Computer Vision, 2004
Textons, the elements of texture perception, and their interactions
Nature, 1981

Cited by 15 articles