We propose a novel method, Modality-based Redundancy Reduction Fusion (MRRF), for understanding and modulating the relative contribution of each modality in multimodal inference tasks. This is achieved by obtaining an $(M+1)$-way tensor to consider the high-order relationships between $M$ modalities and the output layer of a neural network model. Applying a modality-based tensor factorization method, which adopts different factors for different modalities, results in removing information present in a modality that can be compensated by other modalities, with respect to model outputs. This helps to understand the relative utility of information in each modality. In addition it leads to a less complicated model with less parameters and therefore could be applied as a regularizer avoiding overfitting. We have applied this method to three different multimodal datasets in sentiment analysis, personality trait recognition, and emotion recognition. We are able to recognize relationships and relative importance of different modalities in these tasks and achieves a 1\% to 4\% improvement on several evaluation measures compared to the state-of-the-art for all three tasks.
翻译:我们提出了一种新颖的方法——基于模态的冗余减少融合(MRRF),用于理解和调控多模态推理任务中各模态的相对贡献。通过构建一个$(M+1)$维张量来考虑$M$个模态与神经网络输出层之间的高阶关系,并应用基于模态的张量分解方法(对不同模态采用不同因子),可以从模型输出角度移除某一模态中可由其他模态补偿的信息。这有助于理解每个模态中信息的相对效用,同时还能简化模型结构、减少参数量,并可作为正则化器避免过拟合。我们将此方法应用于情感分析、人格特质识别和情绪识别等三个多模态数据集,成功识别了不同模态的关系和相对重要性,并在三个任务的多项评估指标上相较当前最优方法实现了1%至4%的提升。