Facial action unit (AU) detection has long encountered the challenge of detecting subtle feature differences when AUs activate. Existing methods often rely on encoding pixel-level information of AUs, which not only encodes additional redundant information but also leads to increased model complexity and limited generalizability. Additionally, the accuracy of AU detection is negatively impacted by the class imbalance issue of each AU type, and the presence of noisy and false AU labels. In this paper, we introduce a novel contrastive learning framework aimed for AU detection that incorporates both self-supervised and supervised signals, thereby enhancing the learning of discriminative features for accurate AU detection. To tackle the class imbalance issue, we employ a negative sample re-weighting strategy that adjusts the step size of updating parameters for minority and majority class samples. Moreover, to address the challenges posed by noisy and false AU labels, we employ a sampling technique that encompasses three distinct types of positive sample pairs. This enables us to inject self-supervised signals into the supervised signal, effectively mitigating the adverse effects of noisy labels. Our experimental assessments, conducted on four widely-utilized benchmark datasets (BP4D, DISFA, GFT and Aff-Wild2), underscore the superior performance of our approach compared to state-of-the-art methods of AU detection. Our code is available at \url{https://github.com/Ziqiao-Shang/AUNCE}.
翻译:面部动作单元检测长期以来面临动作单元激活时细微特征差异难以捕捉的挑战。现有方法通常依赖于编码动作单元的像素级信息,这不仅引入了额外的冗余信息,还导致模型复杂度增加且泛化能力受限。此外,每类动作单元存在的类别不平衡问题以及噪声和错误标注标签,均对动作单元检测的准确性产生负面影响。本文提出一种新颖的面向动作单元检测的对比学习框架,该框架融合了自监督与监督信号,从而增强了对判别性特征的学习以提升动作单元检测精度。针对类别不平衡问题,我们采用负样本重加权策略,该策略通过调整少数类与多数类样本的参数更新步长来实现。此外,为应对噪声和错误标注标签带来的挑战,我们采用了一种包含三种不同类型正样本对的采样技术。这使得我们能够将自监督信号注入监督信号中,有效缓解噪声标签的不利影响。我们在四个广泛使用的基准数据集(BP4D、DISFA、GFT 和 Aff-Wild2)上进行的实验评估表明,相较于最先进的动作单元检测方法,我们的方法具有更优越的性能。我们的代码公开于 \url{https://github.com/Ziqiao-Shang/AUNCE}。