Facial action unit (AU) detection is challenging due to the difficulty in capturing correlated information from subtle and dynamic AUs. Existing methods often resort to the localization of correlated regions of AUs, in which predefining local AU attentions by correlated facial landmarks often discards essential parts, or learning global attention maps often contains irrelevant areas. Furthermore, existing relational reasoning methods often employ common patterns for all AUs while ignoring the specific way of each AU. To tackle these limitations, we propose a novel adaptive attention and relation (AAR) framework for facial AU detection. Specifically, we propose an adaptive attention regression network to regress the global attention map of each AU under the constraint of attention predefinition and the guidance of AU detection, which is beneficial for capturing both specified dependencies by landmarks in strongly correlated regions and facial globally distributed dependencies in weakly correlated regions. Moreover, considering the diversity and dynamics of AUs, we propose an adaptive spatio-temporal graph convolutional network to simultaneously reason the independent pattern of each AU, the inter-dependencies among AUs, as well as the temporal dependencies. Extensive experiments show that our approach (i) achieves competitive performance on challenging benchmarks including BP4D, DISFA, and GFT in constrained scenarios and Aff-Wild2 in unconstrained scenarios, and (ii) can precisely learn the regional correlation distribution of each AU.
翻译:面部动作单元(AU)检测因难以捕捉细微动态AU之间的关联信息而具有挑战性。现有方法通常采用AU关联区域定位策略,其中通过面部关键点预定义局部AU注意力机制会遗漏关键区域,而学习全局注意力图则可能包含无关区域。此外,现有关系推理方法往往对所有AU采用统一模式,忽视了各AU的特异性。为解决这些局限性,我们提出一种新颖的自适应注意力与关系(AAR)框架用于面部AU检测。具体而言,我们提出自适应注意力回归网络,在注意力预定义约束与AU检测引导下回归每个AU的全局注意力图,该机制既能捕捉强关联区域中由关键点指定的依赖关系,也能获取弱关联区域中面部全局分布的依赖关系。同时,考虑AU的多样性与动态性,我们提出自适应时空图卷积网络,同步推理每个AU的独立模式、AU间相互依赖关系及时间依赖关系。大量实验表明,本方法(i)在约束场景下的BP4D、DISFA和GFT,以及非约束场景下的Aff-Wild2等挑战性基准中均取得竞争性性能,(ii)能够精确学习每个AU的区域关联分布。