Methods to detect malignant lesions from screening mammograms are usually trained with fully annotated datasets, where images are labelled with the localisation and classification of cancerous lesions. However, real-world screening mammogram datasets commonly have a subset that is fully annotated and another subset that is weakly annotated with just the global classification (i.e., without lesion localisation). Given the large size of such datasets, researchers usually face a dilemma with the weakly annotated subset: to not use it or to fully annotate it. The first option will reduce detection accuracy because it does not use the whole dataset, and the second option is too expensive given that the annotation needs to be done by expert radiologists. In this paper, we propose a middle-ground solution for the dilemma, which is to formulate the training as a weakly- and semi-supervised learning problem that we refer to as malignant breast lesion detection with incomplete annotations. To address this problem, our new method comprises two stages, namely: 1) pre-training a multi-view mammogram classifier with weak supervision from the whole dataset, and 2) extending the trained classifier to become a multi-view detector that is trained with semi-supervised student-teacher learning, where the training set contains fully and weakly-annotated mammograms. We provide extensive detection results on two real-world screening mammogram datasets containing incomplete annotations, and show that our proposed approach achieves state-of-the-art results in the detection of malignant breast lesions with incomplete annotations.
翻译:从筛查乳腺X光片中检测恶性病变的方法通常使用完全标注的数据集进行训练,其中图像标注了癌性病变的定位和分类。然而,真实世界的筛查乳腺X光片数据集通常包含一个完全标注的子集和另一个仅具有全局分类(即无病变定位)的弱标注子集。鉴于此类数据集规模庞大,研究人员通常面临关于弱标注子集的两难选择:不使用它,或对其进行完全标注。第一个选择因未使用整个数据集而降低检测精度,第二个选择则因需由放射科专家完成标注而成本过高。本文针对这一困境提出了一种折中方案——将训练形式化为弱监督与半监督学习问题,我们称之为“不完全标注下的恶性乳腺病变检测”。为解决该问题,我们的新方法包含两个阶段:1)利用整个数据集的弱监督预训练多视图乳腺X光片分类器,2)将训练好的分类器扩展为多视图检测器,通过半监督师生学习进行训练,其中训练集包含完全标注与弱标注的乳腺X光片。我们在两个包含不完全标注的真实世界筛查乳腺X光片数据集上提供了大量检测结果,表明所提方法在检测不完全标注下的恶性乳腺病变方面达到了最先进水平。