Automated Parts-Based Model for Recognizing Human–Object Interactions from Aerial Imagery with Fully Convolutional Network
Open Access
- 19 March 2022
- journal article
- research article
- Published by MDPI AG in Remote Sensing
- Vol. 14 (6), 1492
- https://doi.org/10.3390/rs14061492
Abstract
Advanced aerial images have led to the development of improved human–object interaction recognition (HOI) methods for usage in surveillance, security, and public monitoring systems. Despite the ever-increasing rate of research being conducted in the field of HOI, the existing challenges of occlusion, scale variation, fast motion, and illumination variation continue to attract more researchers. In particular, accurate identification of human body parts, the involved objects, and robust features is the key to effective HOI recognition systems. However, identifying different human body parts and extracting their features is a tedious and rather ineffective task. Based on the assumption that only a few body parts are usually involved in a particular interaction, this article proposes a novel parts-based model for recognizing complex human–object interactions in videos and images captured using ground and aerial cameras. Gamma correction and non-local means denoising techniques have been used for pre-processing the video frames and Felzenszwalb’s algorithm has been utilized for image segmentation. After segmentation, twelve human body parts have been detected and five of them have been shortlisted based on their involvement in the interactions. Four kinds of features have been extracted and concatenated into a large feature vector, which has been optimized using the t-distributed stochastic neighbor embedding (t-SNE) technique. Finally, the interactions have been classified using a fully convolutional network (FCN). The proposed system has been validated on the ground and aerial videos of the VIRAT Video, YouTube Aerial, and SYSU 3D HOI datasets, achieving average accuracies of 82.55%, 86.63%, and 91.68% on these datasets, respectively.This publication has 40 references indexed in Scilit:
- Contextual Action Recognition with R* CNNPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- A spatiotemporal motion variation features extraction approach for human tracking and pose-based action recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Discovering human interactions in videos with limited data labelingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Real-time life logging via a depth silhouette-based human activity recognition system for smart home servicesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Human Activity Recognition via Recognized Body Parts of Human Depth Silhouettes for Residents Monitoring Services at Smart HomeIndoor and Built Environment, 2012
- Human Body Parts Tracking Using Torso Tracking: Applications to Activity RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- Image texture classification using textonsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- A large-scale benchmark dataset for event recognition in surveillance videoPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- Efficient Graph-Based Image SegmentationInternational Journal of Computer Vision, 2004
- Textons, the elements of texture perception, and their interactionsNature, 1981