Modality Influence in Multimodal Machine Learning

Multimodal Machine Learning has emerged as a prominent research direction across various applications such as Sentiment Analysis, Emotion Recognition, Machine Translation, Hate Speech Recognition, and Movie Genre Classification. This approach has shown promising results by utilizing modern deep learning architectures. Despite the achievements made, challenges remain in data representation, alignment techniques, reasoning, generation, and quantification within multimodal learning. Additionally, assumptions about the dominant role of textual modality in decision-making have been made. However, limited investigations have been conducted on the influence of different modalities in Multimodal Machine Learning systems. This paper aims to address this gap by studying the impact of each modality on multimodal learning tasks. The research focuses on verifying presumptions and gaining insights into the usage of different modalities. The main contribution of this work is the proposal of a methodology to determine the effect of each modality on several Multimodal Machine Learning models and datasets from various tasks. Specifically, the study examines Multimodal Sentiment Analysis, Multimodal Emotion Recognition, Multimodal Hate Speech Recognition, and Multimodal Disease Detection. The study objectives include training SOTA MultiModal Machine Learning models with masked modalities to evaluate their impact on performance. Furthermore, the research aims to identify the most influential modality or set of modalities for each task and draw conclusions for diverse multimodal classification tasks. By undertaking these investigations, this research contributes to a better understanding of the role of individual modalities in multi-modal learning and provides valuable insights for future advancements in this field.

翻译：多模态机器学习已成为情感分析、情绪识别、机器翻译、仇恨言论识别及电影类型分类等多种应用中的重要研究方向。该方法通过利用现代深度学习架构取得了显著成果。尽管已取得一定进展，多模态学习在数据表示、对齐技术、推理、生成和量化等方面仍面临挑战。此外，研究者普遍假设文本模态在决策中占据主导地位，但关于不同模态对多模态机器学习系统影响的研究仍较为有限。本文旨在通过探究各模态对多模态学习任务的影响来填补这一空白。研究重点在于验证现有假设并深入理解不同模态的使用方式。本研究的主要贡献是提出了一种方法，用于确定各模态对多种多模态机器学习模型及来自不同任务的多个数据集的影响。具体而言，本研究考察了多模态情感分析、多模态情绪识别、多模态仇恨言论识别及多模态疾病检测。研究目标包括：训练掩码模态的最新多模态机器学习模型以评估其对性能的影响；进一步识别每个任务中最具影响力的模态或模态组合，并为多样化的多模态分类任务得出结论。通过上述研究，本文加深了对个体模态在多模态学习中作用的理解，并为该领域的未来发展提供了有价值的见解。