Group-level emotion recognition (GER) is an inseparable part of human behavior analysis, aiming to recognize an overall emotion in a multi-person scene. However, the existing methods are devoted to combing diverse emotion cues while ignoring the inherent uncertainties under unconstrained environments, such as congestion and occlusion occurring within a group. Additionally, since only group-level labels are available, inconsistent emotion predictions among individuals in one group can confuse the network. In this paper, we propose an uncertainty-aware learning (UAL) method to extract more robust representations for GER. By explicitly modeling the uncertainty of each individual, we utilize stochastic embedding drawn from a Gaussian distribution instead of deterministic point embedding. This representation captures the probabilities of different emotions and generates diverse predictions through this stochasticity during the inference stage. Furthermore, uncertainty-sensitive scores are adaptively assigned as the fusion weights of individuals' face within each group. Moreover, we develop an image enhancement module to enhance the model's robustness against severe noise. The overall three-branch model, encompassing face, object, and scene component, is guided by a proportional-weighted fusion strategy and integrates the proposed uncertainty-aware method to produce the final group-level output. Experimental results demonstrate the effectiveness and generalization ability of our method across three widely used databases.
翻译:群体情绪识别(GER)是人类行为分析不可或缺的组成部分,旨在识别多人场景中的整体情绪。然而,现有方法致力于融合多种情绪线索,却忽视了非约束环境下(如群体内拥堵和遮挡)固有的不确定性。此外,由于仅有群体级标签可用,同一群体中个体间不一致的情绪预测会混淆网络。本文提出了一种不确定性感知学习(UAL)方法,用于提取更鲁棒的GER表征。通过显式建模每个个体的不确定性,我们采用从高斯分布中抽取的随机嵌入,而非确定性点嵌入。这种表征捕捉了不同情绪的概率,并在推理阶段通过随机性产生多样化的预测。同时,不确定性敏感分数被自适应地分配为每个群体内个体面部的融合权重。此外,我们开发了一个图像增强模块,以增强模型对严重噪声的鲁棒性。整体三分支模型涵盖面部、物体和场景组件,由比例加权融合策略引导,并集成了所提出的不确定性感知方法以生成最终的群体级输出。实验结果表明,我们的方法在三个广泛使用的数据集上具有有效性和泛化能力。