Designing an effective representation learning method for multimodal sentiment analysis tasks is a crucial research direction. The challenge lies in learning both shared and private information in a complete modal representation, which is difficult with uniform multimodal labels and a raw feature fusion approach. In this work, we propose a deep modal shared information learning module based on the covariance matrix to capture the shared information between modalities. Additionally, we use a label generation module based on a self-supervised learning strategy to capture the private information of the modalities. Our module is plug-and-play in multimodal tasks, and by changing the parameterization, it can adjust the information exchange relationship between the modes and learn the private or shared information between the specified modes. We also employ a multi-task learning strategy to help the model focus its attention on the modal differentiation training data. We provide a detailed formulation derivation and feasibility proof for the design of the deep modal shared information learning module. We conduct extensive experiments on three common multimodal sentiment analysis baseline datasets, and the experimental results validate the reliability of our model. Furthermore, we explore more combinatorial techniques for the use of the module. Our approach outperforms current state-of-the-art methods on most of the metrics of the three public datasets.
翻译:设计有效的多模态情感分析任务表征学习方法是一个关键研究方向。其难点在于如何在完整模态表示中同时学习共享信息和私有信息——这在使用统一多模态标签和原始特征融合方法时难以实现。本文提出了一种基于协方差矩阵的深度模态共享信息学习模块,用于捕获模态间的共享信息。同时,我们采用基于自监督学习策略的标签生成模块来捕获模态的私有信息。该模块在多模态任务中可即插即用,通过改变参数化配置能够调整模态间的信息交互关系,并学习指定模态间的私有或共享信息。我们还引入多任务学习策略,帮助模型聚焦于模态差异化训练数据。针对深度模态共享信息学习模块的设计,我们给出了详细的公式推导和可行性证明。在三个常用多模态情感分析基线数据集上开展的大量实验,验证了模型可靠性。此外,我们进一步探索了该模块的更多组合使用技术。在三个公开数据集的大部分指标上,我们的方法均优于现有最先进方法。