Acquiring properly annotated data is expensive in the medical field as it requires experts, time-consuming protocols, and rigorous validation. Active learning attempts to minimize the need for large annotated samples by actively sampling the most informative examples for annotation. These examples contribute significantly to improving the performance of supervised machine learning models, and thus, active learning can play an essential role in selecting the most appropriate information in deep learning-based diagnosis, clinical assessments, and treatment planning. Although some existing works have proposed methods for sampling the best examples for annotation in medical image analysis, they are not task-agnostic and do not use multimodal auxiliary information in the sampler, which has the potential to increase robustness. Therefore, in this work, we propose a Multimodal Variational Adversarial Active Learning (M-VAAL) method that uses auxiliary information from additional modalities to enhance the active sampling. We applied our method to two datasets: i) brain tumor segmentation and multi-label classification using the BraTS2018 dataset, and ii) chest X-ray image classification using the COVID-QU-Ex dataset. Our results show a promising direction toward data-efficient learning under limited annotations.
翻译:在医学领域,获取正确标注的数据成本高昂,因为这需要专家参与、耗时的流程以及严格的验证。主动学习通过主动采样最具信息量的样本进行标注,旨在减少对大量标注样本的需求。这些样本对提升监督机器学习模型的性能至关重要,因此主动学习在基于深度学习的诊断、临床评估和治疗规划中,能够发挥选择最合适信息的关键作用。尽管已有部分研究提出了在医学图像分析中采样最佳标注样本的方法,但这些方法并非任务无关,也未在采样器中利用多模态辅助信息——而这一信息有望提升鲁棒性。为此,我们提出了一种多模态变分对抗主动学习(M-VAAL)方法,利用来自额外模态的辅助信息来增强主动采样。我们将该方法应用于两个数据集:i) 使用BraTS2018数据集进行脑肿瘤分割与多标签分类;ii) 使用COVID-QU-Ex数据集进行胸部X光图像分类。实验结果表明,该方法为在有限标注下实现数据高效学习提供了有前景的方向。