There remain two critical challenges that hinder the development of ERC. Firstly, there is a lack of exploration into mining deeper insights from the data itself for conversational emotion tasks. Secondly, the systems exhibit vulnerability to random modality feature missing, which is a common occurrence in realistic settings. Focusing on these two key challenges, we propose a novel framework for incomplete multimodal learning in ERC, called "Inverted Teacher-studEnt seArCH Network (ITEACH-Net)." ITEACH-Net comprises two novel components: the Emotion Context Changing Encoder (ECCE) and the Inverted Teacher-Student (ITS) framework. Specifically, leveraging the tendency for emotional states to exhibit local stability within conversational contexts, ECCE captures these patterns and further perceives their evolution over time. Recognizing the varying challenges of handling incomplete versus complete data, ITS employs a teacher-student framework to decouple the respective computations. Subsequently, through Neural Architecture Search, the student model develops enhanced computational capabilities for handling incomplete data compared to the teacher model. During testing, we design a novel evaluation method, testing the model's performance under different missing rate conditions without altering the model weights. We conduct experiments on three benchmark ERC datasets, and the results demonstrate that our ITEACH-Net outperforms existing methods in incomplete multimodal ERC. We believe ITEACH-Net can inspire relevant research on the intrinsic nature of emotions within conversation scenarios and pave a more robust route for incomplete learning techniques. Codes will be made available.
翻译:对话情感识别领域的发展仍面临两大关键挑战。首先,缺乏从数据本身挖掘更深层信息以服务于对话情感任务的研究。其次,现有系统对随机模态特征缺失表现出脆弱性,而这在现实场景中普遍存在。针对这两大挑战,我们提出了一种用于不完整多模态对话情感识别学习的新框架,称为"反向师生搜索网络"。ITEACH-Net包含两个新颖组件:情感上下文变化编码器与反向师生框架。具体而言,利用情感状态在对话上下文中倾向于呈现局部稳定性的特点,ECCE捕捉这些模式并进一步感知其随时间演变的过程。认识到处理不完整数据与完整数据所面临挑战的差异性,ITS采用师生框架来解耦各自的计算。随后,通过神经架构搜索,学生模型在处理不完整数据方面发展出优于教师模型的增强计算能力。在测试阶段,我们设计了一种新颖的评估方法,在不改变模型权重的情况下测试模型在不同缺失率条件下的性能。我们在三个基准对话情感识别数据集上进行了实验,结果表明我们的ITEACH-Net在不完整多模态对话情感识别任务中优于现有方法。我们相信ITEACH-Net能够启发关于对话场景中情感本质的相关研究,并为不完整学习技术开辟更稳健的路径。代码将公开提供。