TY - JOUR
T1 - Constrained and directional ensemble attention for facial action unit detection
AU - Shao, Zhiwen
AU - Chen, Bikuan
AU - Zhou, Yong
AU - Shi, Xuehuai
AU - Li, Canlin
AU - Ma, Lizhuang
AU - Yeung, Dit Yan
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2026/1
Y1 - 2026/1
N2 - Facial action unit (AU) detection is a challenging task, due to the subtlety of each AU in local area and the correlations among AUs in global face. In recent years, the prevailing attention mechanism has been introduced to AU detection. However, the inherent mechanism of self-attention weight distribution has been rarely explored. Besides, ensemble learning is an efficient technique, but gains little attention in AU detection. Considering the above limitations, we propose a local self-attention constraining (LSC) network, by regarding the self-attention distribution of each AU as a spatial distribution, and constraining it based on prior knowledge so as to capture AU-related local information. Moreover, to learn correlations among different AU regions, we propose a global dual-directional attention (GDA) network, which adaptively learns global attention map from both vertical and horizontal directions. Last but not least, the two networks from different views of capturing patterns are assembled to integrate both advantages. Extensive experiments on BP4D, DISFA, and GFT benchmarks demonstrate that our methods including local self-attention constraining, global dual-directional attention, and multi-view ensemble all significantly surpass state-of-the-art AU detection works.
AB - Facial action unit (AU) detection is a challenging task, due to the subtlety of each AU in local area and the correlations among AUs in global face. In recent years, the prevailing attention mechanism has been introduced to AU detection. However, the inherent mechanism of self-attention weight distribution has been rarely explored. Besides, ensemble learning is an efficient technique, but gains little attention in AU detection. Considering the above limitations, we propose a local self-attention constraining (LSC) network, by regarding the self-attention distribution of each AU as a spatial distribution, and constraining it based on prior knowledge so as to capture AU-related local information. Moreover, to learn correlations among different AU regions, we propose a global dual-directional attention (GDA) network, which adaptively learns global attention map from both vertical and horizontal directions. Last but not least, the two networks from different views of capturing patterns are assembled to integrate both advantages. Extensive experiments on BP4D, DISFA, and GFT benchmarks demonstrate that our methods including local self-attention constraining, global dual-directional attention, and multi-view ensemble all significantly surpass state-of-the-art AU detection works.
KW - Dual-directional attention
KW - Facial action unit detection
KW - Multi-view ensemble
KW - Self-attention constraining
UR - https://www.scopus.com/pages/publications/105007656219
U2 - 10.1016/j.patcog.2025.111904
DO - 10.1016/j.patcog.2025.111904
M3 - 文章
AN - SCOPUS:105007656219
SN - 0031-3203
VL - 169
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 111904
ER -