Prior-based Objective Inference Mining Potential Uncertainty for Facial Expression Recognition

Annotation ambiguity caused by the inherent subjectivity of visual judgment has always been a major challenge for Facial Expression Recognition (FER) tasks, particularly for largescale datasets from in-the-wild scenarios. A potential solution is the evaluation of relatively objective emotional distributions to help mitigate the ambiguity of subjective annotations. To this end, this paper proposes a novel Prior-based Objective Inference (POI) network. This network employs prior knowledge to derive a more objective and varied emotional distribution and tackles the issue of subjective annotation ambiguity through dynamic knowledge transfer. POI comprises two key networks: Firstly, the Prior Inference Network (PIN) utilizes the prior knowledge of AUs and emotions to capture intricate motion details. To reduce over-reliance on priors and facilitate objective emotional inference, PIN aggregates inferential knowledge from various key facial subregions, encouraging mutual learning. Secondly, the Target Recognition Network (TRN) integrates subjective emotion annotations and objective inference soft labels provided by the PIN, fostering an understanding of inherent facial expression diversity, thus resolving annotation ambiguity. Moreover, we introduce an uncertainty estimation module to quantify and balance facial expression confidence. This module enables a flexible approach to dealing with the uncertainties of subjective annotations. Extensive experiments show that POI exhibits competitive performance on both synthetic noisy datasets and multiple real-world datasets. All codes and training logs will be publicly available at https://github.com/liuhw01/POI.

翻译：由视觉判断固有主观性引起的标注模糊性一直是面部表情识别任务的主要挑战，特别是在野外场景的大规模数据集中。一个潜在的解决方案是评估相对客观的情绪分布，以帮助减轻主观标注的模糊性。为此，本文提出了一种新颖的基于先验的客观推理网络。该网络利用先验知识推导出更客观且多样化的情绪分布，并通过动态知识迁移解决主观标注模糊性问题。POI包含两个关键网络：首先，先验推理网络利用动作单元和情绪的先验知识来捕捉精细的运动细节。为了减少对先验的过度依赖并促进客观情绪推理，PIN聚合来自多个关键面部子区域的推理知识，鼓励相互学习。其次，目标识别网络整合主观情绪标注和PIN提供的客观推理软标签，促进对固有面部表情多样性的理解，从而解决标注模糊性问题。此外，我们引入了一个不确定性估计模块来量化和平衡面部表情置信度。该模块提供了一种灵活处理主观标注不确定性的方法。大量实验表明，POI在合成噪声数据集和多个真实世界数据集上均表现出具有竞争力的性能。所有代码和训练日志将在https://github.com/liuhw01/POI公开提供。