For the Facial Action Unit (AU) detection task, accurately capturing the subtle facial differences between distinct AUs is essential for reliable detection. Additionally, AU detection faces challenges from class imbalance and the presence of noisy or false labels, which undermine detection accuracy. In this paper, we introduce a novel contrastive learning framework aimed for AU detection that incorporates both self-supervised and supervised signals, thereby enhancing the learning of discriminative features for accurate AU detection. To tackle the class imbalance issue, we employ a negative sample re-weighting strategy that adjusts the step size of updating parameters for minority and majority class samples. Moreover, to address the challenges posed by noisy and false AU labels, we employ a sampling technique that encompasses three distinct types of positive sample pairs. This enables us to inject self-supervised signals into the supervised signal, effectively mitigating the adverse effects of noisy labels. Our experimental assessments, conducted on five widely-utilized benchmark datasets (BP4D, DISFA, BP4D+, GFT and Aff-Wild2), underscore the superior performance of our approach compared to state-of-the-art methods of AU detection. Our code is available at https://github.com/Ziqiao-Shang/AUNCE.
翻译:在面部动作单元(AU)检测任务中,准确捕捉不同AU之间细微的面部差异对于实现可靠的检测至关重要。此外,AU检测还面临着类别不平衡以及存在噪声或错误标签的挑战,这些问题会损害检测的准确性。本文提出了一种新颖的、面向AU检测的对比学习框架,该框架融合了自监督与监督信号,从而增强了对判别性特征的学习,以实现准确的AU检测。为应对类别不平衡问题,我们采用了一种负样本重加权策略,该策略调整了少数类与多数类样本在更新参数时的步长。此外,为了解决噪声和错误AU标签带来的挑战,我们采用了一种采样技术,该技术包含三种不同类型的正样本对。这使得我们能够将自监督信号注入到监督信号中,有效缓解噪声标签的不利影响。我们在五个广泛使用的基准数据集(BP4D、DISFA、BP4D+、GFT和Aff-Wild2)上进行的实验评估表明,与最先进的AU检测方法相比,我们的方法具有更优越的性能。我们的代码可在 https://github.com/Ziqiao-Shang/AUNCE 获取。