Due to the widespread applications of conversations in human-computer interaction, Conversation Emotion Recognition (CER) has attracted increasing attention from researchers. In real-world scenarios, the emotional states of both participants in a conversation tend to maintain a relatively stable pattern within the local context, and often encountering issues with incomplete data patterns. Focusing on these two key challenges, we propose a novel framework for incomplete multimodal learning in CER, called "Inverted Teacher-studEnt seArch Conversation Network (ITEACNet)." ITEACNet comprises two novel components: the "Emotion Context Changing Encoder (ECCE)" and the "Inverted Teacher-Student framework (ITS)." ECCE considers context changes from both local and global perspectives, while the ITS allows a simple teacher model to learn complete data processing methods, enabling a complex student model to follow the performance of the teacher model using incomplete data. Furthermore, we employ a Neural Architecture Search algorithm to enhance the capabilities of student model , achieving superior model performance. Finally, to align with real-world scenarios, we introduce a novel evaluation method, testing the model's performance under different missing rate conditions without altering the model weights. We conduct experiments on three benchmark CER datasets, and the results demonstrate that our ITEACNet outperforms existing methods in incomplete multimodal CER.
翻译:随着对话在人机交互中的广泛应用,对话情感识别(CER)日益受到研究者关注。在现实场景中,对话双方的情感状态在局部上下文中倾向于保持相对稳定的模式,且常面临数据模式不完整的问题。针对这两个关键挑战,我们提出了一种用于非完整多模态学习的新型框架,称为“倒置师生搜索对话网络(ITEACNet)”。ITEACNet包含两个创新组件:“情感上下文变化编码器(ECCE)”和“倒置师生框架(ITS)”。ECCE从局部和全局两个角度考虑上下文变化,而ITS允许一个简单的教师模型学习完整的数据处理方法,使复杂的学生模型能够使用不完整数据跟随教师模型的性能。此外,我们采用神经架构搜索算法增强学生模型的能力,实现更优的模型性能。最后,为贴合真实场景,我们引入了一种新的评估方法,在不改变模型权重的情况下测试模型在不同缺失率条件下的性能。我们在三个基准CER数据集上进行了实验,结果表明我们的ITEACNet在非完整多模态CER任务中优于现有方法。