Training multimodal networks requires a vast amount of data due to their larger parameter space compared to unimodal networks. Active learning is a widely used technique for reducing data annotation costs by selecting only those samples that could contribute to improving model performance. However, current active learning strategies are mostly designed for unimodal tasks, and when applied to multimodal data, they often result in biased sample selection from the dominant modality. This unfairness hinders balanced multimodal learning, which is crucial for achieving optimal performance. To address this issue, we propose three guidelines for designing a more balanced multimodal active learning strategy. Following these guidelines, a novel approach is proposed to achieve more fair data selection by modulating the gradient embedding with the dominance degree among modalities. Our studies demonstrate that the proposed method achieves more balanced multimodal learning by avoiding greedy sample selection from the dominant modality. Our approach outperforms existing active learning strategies on a variety of multimodal classification tasks. Overall, our work highlights the importance of balancing sample selection in multimodal active learning and provides a practical solution for achieving more balanced active learning for multimodal classification.
翻译:训练多模态网络需要大量数据,因为其参数空间相比单模态网络更为庞大。主动学习是一种广泛使用的技术,通过仅选取那些可能有助于提升模型性能的样本来降低数据标注成本。然而,当前的主动学习策略主要针对单模态任务设计,当应用于多模态数据时,往往会导致对主导模态的有偏样本选取。这种不公平性阻碍了对实现最优性能至关重要的平衡多模态学习。为解决该问题,我们提出了三条设计更平衡的多模态主动学习策略的指导方针。遵循这些指导方针,我们提出了一种新方法,通过使用模态间的优势度调制梯度嵌入,实现更公平的数据选取。研究表明,所提方法通过避免从主导模态中贪婪地选取样本,实现了更平衡的多模态学习。我们的方法在多种多模态分类任务上优于现有的主动学习策略。总体而言,本文工作凸显了在多模态主动学习中平衡样本选取的重要性,并为实现多模态分类中更平衡的主动学习提供了实用方案。