Multimodal emotion recognition in conversation (MERC) has garnered substantial research attention recently. Existing MERC methods face several challenges: (1) they fail to fully harness direct inter-modal cues, possibly leading to less-than-thorough cross-modal modeling; (2) they concurrently extract information from the same and different modalities at each network layer, potentially triggering conflicts from the fusion of multi-source data; (3) they lack the agility required to detect dynamic sentimental changes, perhaps resulting in inaccurate classification of utterances with abrupt sentiment shifts. To address these issues, a novel approach named GraphSmile is proposed for tracking intricate emotional cues in multimodal dialogues. GraphSmile comprises two key components, i.e., GSF and SDP modules. GSF ingeniously leverages graph structures to alternately assimilate inter-modal and intra-modal emotional dependencies layer by layer, adequately capturing cross-modal cues while effectively circumventing fusion conflicts. SDP is an auxiliary task to explicitly delineate the sentiment dynamics between utterances, promoting the model's ability to distinguish sentimental discrepancies. Furthermore, GraphSmile is effortlessly applied to multimodal sentiment analysis in conversation (MSAC), forging a unified multimodal affective model capable of executing MERC and MSAC tasks. Empirical results on multiple benchmarks demonstrate that GraphSmile can handle complex emotional and sentimental patterns, significantly outperforming baseline models.
翻译:对话中的多模态情感识别(MERC)近来获得了广泛的研究关注。现有的MERC方法面临若干挑战:(1)未能充分利用模态间的直接线索,可能导致跨模态建模不够充分;(2)在每一网络层同时从相同和不同模态提取信息,可能引发多源数据融合的冲突;(3)缺乏检测动态情感变化的灵活性,可能导致对情感突变话语的分类不准确。为解决这些问题,本文提出了一种名为GraphSmile的新方法,用于追踪多模态对话中复杂的情绪线索。GraphSmile包含两个关键组件,即GSF模块和SDP模块。GSF巧妙地利用图结构逐层交替吸收模态间与模态内的情感依赖关系,在充分捕捉跨模态线索的同时有效规避融合冲突。SDP是一项辅助任务,旨在显式刻画话语间的情感动态,从而提升模型区分情感差异的能力。此外,GraphSmile可轻松应用于对话中的多模态情感分析(MSAC),构建了一个能够同时执行MERC与MSAC任务的统一多模态情感模型。在多个基准数据集上的实证结果表明,GraphSmile能够处理复杂的情感与情绪模式,其性能显著优于基线模型。