On facial expression datasets with complex and numerous feature types, where the significance and dominance of labeled features are difficult to predict, facial expression recognition(FER) encounters the challenges of inter-class similarity and intra-class variances, making it difficult to mine effective features. We aim to solely leverage the feature similarity among facial samples to address this. We introduce the Cross Similarity Attention (CSA), an input-output position-sensitive attention mechanism that harnesses feature similarity across different images to compute the corresponding global spatial attention. Based on this, we propose a four-branch circular framework, called Quadruplet Cross Similarity (QCS), to extract discriminative features from the same class and eliminate redundant ones from different classes synchronously to refine cleaner features. The symmetry of the network ensures balanced and stable training and reduces the amount of CSA interaction matrix. Contrastive residual distillation is utilized to transfer the information learned in the cross module back to the base network. The cross-attention module exists during training, and only one base branch is retained during inference. our proposed QCS model outperforms state-of-the-art methods on several popular FER datasets, without requiring additional landmark information or other extra training data. The code is available at https://github.com/birdwcp/QCS.
翻译:在特征类型复杂且数量众多的面部表情数据集上,由于标记特征的重要性和主导性难以预测,面部表情识别(FER)面临着类间相似性和类内差异性的挑战,导致难以挖掘有效特征。本文旨在仅利用面部样本间的特征相似性来解决该问题。我们提出了交叉相似性注意力机制(CSA),这是一种对输入输出位置敏感的注意力机制,它利用不同图像间的特征相似性来计算相应的全局空间注意力。在此基础上,我们提出了一种称为四元组交叉相似度(QCS)的四分支循环框架,通过同步提取同类样本中的判别性特征并消除异类样本中的冗余特征,从而精炼出更纯净的特征表示。网络的对称性确保了训练过程的平衡性与稳定性,并减少了CSA交互矩阵的计算量。我们采用对比残差蒸馏技术,将交叉模块学习到的信息传递回基础网络。该交叉注意力模块仅在训练阶段存在,推理阶段仅保留一个基础分支。实验表明,我们提出的QCS模型在多个主流FER数据集上超越了现有最优方法,且无需额外的关键点信息或其他辅助训练数据。代码已开源:https://github.com/birdwcp/QCS。