Given the similarity between facial expression categories, the presence of compound facial expressions, and the subjectivity of annotators, facial expression recognition (FER) datasets often suffer from ambiguity and noisy labels. Ambiguous expressions are challenging to differentiate from expressions with noisy labels, which hurt the robustness of FER models. Furthermore, the difficulty of recognition varies across different expression categories, rendering a uniform approach unfair for all expressions. In this paper, we introduce a novel approach called Adaptive Sample Mining (ASM) to dynamically address ambiguity and noise within each expression category. First, the Adaptive Threshold Learning module generates two thresholds, namely the clean and noisy thresholds, for each category. These thresholds are based on the mean class probabilities at each training epoch. Next, the Sample Mining module partitions the dataset into three subsets: clean, ambiguity, and noise, by comparing the sample confidence with the clean and noisy thresholds. Finally, the Tri-Regularization module employs a mutual learning strategy for the ambiguity subset to enhance discrimination ability, and an unsupervised learning strategy for the noise subset to mitigate the impact of noisy labels. Extensive experiments prove that our method can effectively mine both ambiguity and noise, and outperform SOTA methods on both synthetic noisy and original datasets. The supplement material is available at https://github.com/zzzzzzyang/ASM.
翻译:摘要:由于面部表情类别之间的相似性、复合表情的存在以及标注人员的主观性,面部表情识别数据集常存在标签模糊和噪声标签的问题。模糊表情与带噪标签的表情难以区分,这损害了表情识别模型的鲁棒性。此外,不同表情类别的识别难度各异,使得统一方法对所有表情而言有失公平。本文提出一种名为自适应样本挖掘(ASM)的新方法,以动态处理每个表情类别中的模糊和噪声问题。首先,自适应阈值学习模块为每个类别生成两个阈值(纯净阈值和噪声阈值),这两个阈值基于每个训练轮次的类别平均概率。其次,样本挖掘模块通过将样本置信度与纯净阈值和噪声阈值进行比较,将数据集划分为三个子集:纯净集、模糊集和噪声集。最后,三重正则化模块对模糊子集采用共同学习策略以增强判别能力,对噪声子集采用无监督学习策略以减轻噪声标签的影响。大量实验证明,我们的方法能有效挖掘模糊样本和噪声样本,在合成噪声数据集和原始数据集上均优于现有最优方法。补充材料可在https://github.com/zzzzzzyang/ASM获取。