The imbalance problem is widespread in the field of machine learning, which also exists in multimodal learning areas caused by the intrinsic discrepancy between modalities of samples. Recent works have attempted to solve the modality imbalance problem from algorithm perspective, however, they do not fully analyze the influence of modality bias in datasets. Concretely, existing multimodal datasets are usually collected under specific tasks, where one modality tends to perform better than other ones in most conditions. In this work, to comprehensively explore the influence of modality bias, we first split existing datasets into different subsets by estimating sample-wise modality discrepancy. We surprisingly find that: the multimodal models with existing imbalance algorithms consistently perform worse than the unimodal one on specific subsets, in accordance with the modality bias. To further explore the influence of modality bias and analyze the effectiveness of existing imbalance algorithms, we build a balanced audiovisual dataset, with uniformly distributed modality discrepancy over the whole dataset. We then conduct extensive experiments to re-evaluate existing imbalance algorithms and draw some interesting findings: existing algorithms only provide a compromise between modalities and suffer from the large modality discrepancy of samples. We hope that these findings could facilitate future research on the modality imbalance problem.
翻译:不平衡问题在机器学习领域普遍存在,在多模态学习领域,由于样本模态间的固有差异,这一问题同样突出。现有研究尝试从算法角度解决模态不平衡问题,但尚未深入分析数据集中模态偏差的影响。具体而言,当前多模态数据集通常针对特定任务收集,导致在大多数情况下,某一模态的表现往往优于其他模态。为全面探究模态偏差的影响,本研究首先通过估算样本级模态差异,将现有数据集划分为不同子集。我们惊奇地发现:采用现有不平衡算法的多模态模型,在特定子集上的表现始终逊于单模态模型,且这一现象与模态偏差高度吻合。为进一步探索模态偏差的影响并分析现有不平衡算法的有效性,我们构建了一个均衡视听数据集,确保整个数据集上模态差异均匀分布。随后,我们开展大量实验重新评估现有不平衡算法,得出一些有趣结论:现有算法仅能实现模态间的折中,在处理大模态差异样本时表现不佳。我们期望这些发现能够推动未来对模态不平衡问题的研究。