Compound Expression Recognition (CER) plays a crucial role in interpersonal interactions. Due to the existence of Compound Expressions , human emotional expressions are complex, requiring consideration of both local and global facial expressions to make judgments. In this paper, to address this issue, we propose a solution based on ensemble learning methods for Compound Expression Recognition. Specifically, our task is classification, where we train three expression classification models based on convolutional networks, Vision Transformers, and multi-scale local attention networks. Then, through model ensemble using late fusion, we merge the outputs of multiple models to predict the final result. Our method achieves high accuracy on RAF-DB and is able to recognize expressions through zero-shot on certain portions of C-EXPR-DB.
翻译:复合表情识别(CER)在人际互动中具有关键作用。由于复合表情的存在,人类情感表达呈现出复杂性,需要综合考虑局部与全局面部特征进行判断。针对这一问题,本文提出基于集成学习的复合表情识别解决方案。具体而言,本任务为分类任务,我们分别基于卷积网络、视觉Transformer(Vision Transformer)及多尺度局部注意力网络训练了三种表情分类模型。通过采用后融合策略进行模型集成,将多个模型的输出进行整合以预测最终结果。本方法在RAF-DB数据集上实现了高准确率,并能在C-EXPR-DB数据集的部分类别上通过零样本学习实现表情识别。