This paper presents the results of the SUN team for the Compound Expressions Recognition Challenge of the 6th ABAW Competition. We propose a novel audio-visual method for compound expression recognition. Our method relies on emotion recognition models that fuse modalities at the emotion probability level, while decisions regarding the prediction of compound expressions are based on predefined rules. Notably, our method does not use any training data specific to the target task. Thus, the problem is a zero-shot classification task. The method is evaluated in multi-corpus training and cross-corpus validation setups. Using our proposed method is achieved an F1-score value equals to 22.01% on the C-EXPR-DB test subset. Our findings from the challenge demonstrate that the proposed method can potentially form a basis for developing intelligent tools for annotating audio-visual data in the context of human's basic and compound emotions.
翻译:本文介绍了SUN团队在第六届ABAW竞赛复合表情识别挑战中的研究成果。我们提出了一种新颖的音视频复合表情识别方法。该方法依赖在情感概率层面融合多模态的情感识别模型,同时复合表情的预测决策基于预设规则。值得注意的是,该方法未使用任何针对目标任务的训练数据,因此属于零样本分类任务。我们在多语料训练和跨语料验证框架下对该方法进行了评估。所提方法在C-EXPR-DB测试子集上取得了22.01%的F1分数。挑战赛结果表明,该方法有潜力为开发人类基本与复合情感场景下的音视频数据标注智能工具奠定基础。