In this study, we investigate how environmental factors, specifically the scenes and objects involved, can affect the expression of emotions through body language. To this end, we introduce a novel multi-stream deep convolutional neural network named BEE-NET. We also propose a new late fusion strategy that incorporates meta-information on places and objects as prior knowledge in the learning process. Our proposed probabilistic pooling model leverages this information to generate a joint probability distribution of both available and anticipated non-available contextual information in latent space. Importantly, our fusion strategy is differentiable, allowing for end-to-end training and capturing of hidden associations among data points without requiring further post-processing or regularisation. To evaluate our deep model, we use the Body Language Database (BoLD), which is currently the largest available database for the Automatic Identification of the in-the-wild Bodily Expression of Emotions (AIBEE). Our experimental results demonstrate that our proposed approach surpasses the current state-of-the-art in AIBEE by a margin of 2.07%, achieving an Emotional Recognition Score of 66.33%.
翻译:本研究探讨了环境因素(特别是场景和涉及的对象)如何通过身体语言影响情感表达。为此,我们提出了一种名为BEE-NET的新型多流深度卷积神经网络。我们还提出了一种新的晚期融合策略,该策略将地点和对象的元信息作为先验知识融入学习过程。我们提出的概率池模型利用这些信息,在潜在空间中生成可用与预期不可用上下文信息的联合概率分布。重要的是,我们的融合策略是可微的,支持端到端训练,并能捕获数据点之间的隐藏关联,无需额外的后处理或正则化。为评估我们的深度模型,我们使用了身体语言数据库(BoLD),该数据库是目前用于野外身体情感表达自动识别(AIBEE)的最大可用数据库。实验结果表明,我们提出的方法在AIBEE任务上超越了当前最优水平2.07%,情感识别得分达到66.33%。