We present a novel facial expression recognition network, called Distract your Attention Network (DAN). Our method is based on two key observations. Firstly, multiple classes share inherently similar underlying facial appearance, and their differences could be subtle. Secondly, facial expressions exhibit themselves through multiple facial regions simultaneously, and the recognition requires a holistic approach by encoding high-order interactions among local features. To address these issues, we propose our DAN with three key components: Feature Clustering Network (FCN), Multi-head cross Attention Network (MAN), and Attention Fusion Network (AFN). The FCN extracts robust features by adopting a large-margin learning objective to maximize class separability. In addition, the MAN instantiates a number of attention heads to simultaneously attend to multiple facial areas and build attention maps on these regions. Further, the AFN distracts these attentions to multiple locations before fusing the attention maps to a comprehensive one. Extensive experiments on three public datasets (including AffectNet, RAF-DB, and SFEW 2.0) verified that the proposed method consistently achieves state-of-the-art facial expression recognition performance. Code will be made available at https://github.com/yaoing/DAN.
翻译:我们提出了一种新颖的面部表情识别网络,称为“分散注意力网络”(DAN)。我们的方法基于两个关键观察。首先,多个类别共享本质上相似的底层面部外观,它们之间的差异可能很细微。其次,面部表情通过多个面部区域同时表现出来,识别需要一种通过编码局部特征间高阶交互的整体方法。为解决这些问题,我们提出了DAN,包含三个关键组件:特征聚类网络(FCN)、多头交叉注意力网络(MAN)和注意力融合网络(AFN)。FCN通过采用大间隔学习目标来最大化类别可分性,从而提取鲁棒特征。此外,MAN实例化多个注意力头,以同时关注多个面部区域,并在这些区域上构建注意力图。进一步地,AFN在将这些注意力图融合为综合图之前,将注意力分散到多个位置。在三个公开数据集(包括AffectNet、RAF-DB和SFEW 2.0)上的大量实验验证了所提方法能持续实现最先进的面部表情识别性能。代码将在https://github.com/yaoing/DAN提供。