Mutual Information-based Representations Disentanglement for Unaligned Multimodal Language Sequences

The key challenge in unaligned multimodal language sequences lies in effectively integrating information from various modalities to obtain a refined multimodal joint representation. Recently, the disentangle and fuse methods have achieved the promising performance by explicitly learning modality-agnostic and modality-specific representations and then fusing them into a multimodal joint representation. However, these methods often independently learn modality-agnostic representations for each modality and utilize orthogonal constraints to reduce linear correlations between modality-agnostic and modality-specific representations, neglecting to eliminate their nonlinear correlations. As a result, the obtained multimodal joint representation usually suffers from information redundancy, leading to overfitting and poor generalization of the models. In this paper, we propose a Mutual Information-based Representations Disentanglement (MIRD) method for unaligned multimodal language sequences, in which a novel disentanglement framework is designed to jointly learn a single modality-agnostic representation. In addition, the mutual information minimization constraint is employed to ensure superior disentanglement of representations, thereby eliminating information redundancy within the multimodal joint representation. Furthermore, the challenge of estimating mutual information caused by the limited labeled data is mitigated by introducing unlabeled data. Meanwhile, the unlabeled data also help to characterize the underlying structure of multimodal data, consequently further preventing overfitting and enhancing the performance of the models. Experimental results on several widely used benchmark datasets validate the effectiveness of our proposed approach.

翻译：未对齐多模态语言序列的关键挑战在于如何有效整合来自不同模态的信息，以获得精炼的多模态联合表示。近期，解耦与融合方法通过显式学习模态无关表示和模态特定表示，并将其融合为多模态联合表示，取得了显著性能。然而，这些方法通常独立学习各模态的模态无关表示，并利用正交约束来降低模态无关表示与模态特定表示之间的线性相关性，却未能消除其非线性相关性。因此，所得多模态联合表示常存在信息冗余问题，导致模型过拟合和泛化能力下降。本文提出一种基于互信息的未对齐多模态语言序列表示解耦方法，该方法设计了一种新颖的解耦框架以联合学习单一模态无关表示。此外，通过引入互信息最小化约束确保表示的高度解耦性，从而消除多模态联合表示中的信息冗余。针对有限标注数据导致的互信息估计难题，本方法通过引入未标注数据进行缓解。同时，未标注数据有助于刻画多模态数据的底层结构，进而进一步防止过拟合并提升模型性能。在多个广泛使用的基准数据集上的实验结果验证了所提方法的有效性。