Training multimodal networks requires a vast amount of data due to their larger parameter space compared to unimodal networks. Active learning is a widely used technique for reducing data annotation costs by selecting only those samples that could contribute to improving model performance. However, current active learning strategies are mostly designed for unimodal tasks, and when applied to multimodal data, they often result in biased sample selection from the dominant modality. This unfairness hinders balanced multimodal learning, which is crucial for achieving optimal performance. To address this issue, we propose three guidelines for designing a more balanced multimodal active learning strategy. Following these guidelines, a novel approach is proposed to achieve more fair data selection by modulating the gradient embedding with the dominance degree among modalities. Our studies demonstrate that the proposed method achieves more balanced multimodal learning by avoiding greedy sample selection from the dominant modality. Our approach outperforms existing active learning strategies on a variety of multimodal classification tasks. Overall, our work highlights the importance of balancing sample selection in multimodal active learning and provides a practical solution for achieving more balanced active learning for multimodal classification.
翻译:训练多模态网络由于参数空间远大于单模态网络,因此需要海量数据。主动学习是通过仅选取能够提升模型性能的样本来降低数据标注成本的常用技术。然而,现有主动学习策略主要针对单模态任务设计,直接应用于多模态数据时,往往导致样本选择偏向优势模态。这种不均衡性阻碍了多模态平衡学习——而平衡学习对于实现最优性能至关重要。为解决该问题,我们提出三项设计更平衡的多模态主动学习策略的准则。遵循这些准则,我们提出一种新方法,通过使用模态间主导程度调制梯度嵌入,实现更公平的数据选择。研究表明,所提方法通过避免贪婪地从优势模态选取样本,实现了更均衡的多模态学习。我们的方法在多种多模态分类任务上优于现有主动学习策略。总的来说,本文揭示了多模态主动学习中样本选择平衡性的重要性,并为实现多模态分类的更平衡主动学习提供了实用解决方案。