Human multimodal emotion recognition (MER) seeks to infer human emotions by integrating information from language, visual, and acoustic modalities. Although existing MER approaches have achieved promising results, they still struggle with inherent multimodal heterogeneities and varying contributions from different modalities. To address these challenges, we propose a novel framework, Decoupled Hierarchical Multimodal Distillation (DHMD). DHMD decouples each modality's features into modality-irrelevant (homogeneous) and modality-exclusive (heterogeneous) components using a self-regression mechanism. The framework employs a two-stage knowledge distillation (KD) strategy: (1) coarse-grained KD via a Graph Distillation Unit (GD-Unit) in each decoupled feature space, where a dynamic graph facilitates adaptive distillation among modalities, and (2) fine-grained KD through a cross-modal dictionary matching mechanism, which aligns semantic granularities across modalities to produce more discriminative MER representations. This hierarchical distillation approach enables flexible knowledge transfer and effectively improves cross-modal feature alignment. Experimental results demonstrate that DHMD consistently outperforms state-of-the-art MER methods, achieving 1.3\%/2.4\% (ACC$_7$), 1.3\%/1.9\% (ACC$_2$) and 1.9\%/1.8\% (F1) relative improvement on CMU-MOSI/CMU-MOSEI dataset, respectively. Meanwhile, visualization results reveal that both the graph edges and dictionary activations in DHMD exhibit meaningful distribution patterns across modality-irrelevant/-exclusive feature spaces.
翻译:多模态情感识别旨在通过整合语言、视觉和声学模态的信息来推断人类情感。尽管现有方法已取得显著成果,但仍面临多模态固有异质性及各模态贡献度差异的挑战。为解决这些问题,我们提出了一种新颖的框架——解耦式分层多模态蒸馏。该框架通过自回归机制将各模态特征解耦为模态无关(同质)与模态专属(异质)分量,并采用两阶段知识蒸馏策略:第一阶段通过图蒸馏单元在各解耦特征空间进行粗粒度蒸馏,其中动态图结构促进模态间的自适应知识迁移;第二阶段通过跨模态词典匹配机制实现细粒度蒸馏,对齐跨模态语义粒度以生成判别性更强的多模态情感表征。这种分层蒸馏方法实现了灵活的知识迁移,有效提升了跨模态特征对齐能力。实验结果表明,DHMD在CMU-MOSI/CMU-MOSEI数据集上持续优于当前最优方法,在ACC$_7$、ACC$_2$和F1指标上分别取得1.3\%/2.4\%、1.3\%/1.9\%和1.9\%/1.8\%的相对提升。可视化结果进一步表明,DHMD中的图边连接与词典激活在模态无关/专属特征空间均呈现出具有解释性的分布模式。