Progressive Multi-Scale Vision Transformer for Facial Action Unit Detection
Open Access
- 12 January 2022
- journal article
- research article
- Published by Frontiers Media SA in Frontiers in Neurorobotics
- Vol. 15, 824592
- https://doi.org/10.3389/fnbot.2021.824592
Abstract
Facial action unit (AU) detection is an important task in affective computing and has attracted extensive attention in the field of computer vision and artificial intelligence. Previous studies for AU detection usually encode complex regional feature representations with manually defined facial landmarks and learn to model the relationships among AUs via graph neural network. Albeit some progress has been achieved, it is still tedious for existing methods to capture the exclusive and concurrent relationships among different combinations of the facial AUs. To circumvent this issue, we proposed a new progressive multi-scale vision transformer (PMVT) to capture the complex relationships among different AUs for the wide range of expressions in a data-driven fashion. PMVT is based on the multi-scale self-attention mechanism that can flexibly attend to a sequence of image patches to encode the critical cues for AUs. Compared with previous AU detection methods, the benefits of PMVT are 2-fold: (i) PMVT does not rely on manually defined facial landmarks to extract the regional representations, and (ii) PMVT is capable of encoding facial regions with adaptive receptive fields, thus facilitating representation of different AU flexibly. Experimental results show that PMVT improves the AU detection accuracy on the popular BP4D and DISFA datasets. Compared with other state-of-the-art AU detection methods, PMVT obtains consistent improvements. Visualization results show PMVT automatically perceives the discriminative facial regions for robust AU detection.This publication has 31 references indexed in Scilit:
- Action Unit Detection with Region Adaptation, Multi-labeling Learning and Optimal Temporal FusingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2017
- EAC-Net: A Region-Based Deep Enhancing and Cropping Approach for Facial Action Unit DetectionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2017
- Deep Residual Learning for Image RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- EmotioNet: An Accurate, Real-Time Algorithm for the Automatic Annotation of a Million Facial Expressions in the WildPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Deep Region and Multi-label Learning for Facial Action Unit DetectionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- A Novel Method to Detect Functional microRNA Regulatory Modules by Bicliques MergingIEEE/ACM Transactions on Computational Biology and Bioinformatics, 2015
- BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression databaseImage and Vision Computing, 2014
- Pain Intensity Evaluation through Facial Action UnitsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- DISFA: A Spontaneous Facial Action Intensity DatabaseIEEE Transactions on Affective Computing, 2013
- Large-Scale Learning of Structure−Activity Relationships Using a Linear Support Vector Machine and Problem-Specific MetricsJournal of Chemical Information and Modeling, 2011