We consider training classifiers for 3D medical images using only one binary label for the entire volume rather than a label for each 2D slice. In such weakly supervised settings, can we learn accurate classifiers for slice-level predictions? Attention-based multiple instance learning (MIL) can produce an attention score for every slice. Yet recent work demonstrates that a simple center-focused baseline that ignores image content can outperform attention-based and transformer-based MIL at slice-level classification of 3D brain scans. We show this baseline also outperforms existing MIL at slice-level classification of thoracic and abdominal CT scans. Motivated by this baseline, we propose Normal Guidance, a regularization technique that encourages the learned attention distribution to follow a bell-shaped curve. Across three medical imaging datasets totaling over 4 million 2D slices, we show our Normal Guidance enables attention-based and transformer-based MIL methods to deliver significantly better slice-level localization than the state-of-the-art while remaining competitive at whole-scan classification.
翻译:我们考虑仅利用整个三维医学影像的一个二元标签(而非每个二维切片单独标注)来训练分类器。在这种弱监督场景下,能否学习到精确的切片级预测分类器?基于注意力的多实例学习(MIL)可为每个切片生成注意力分数。然而近期研究表明,一个忽略图像内容、仅聚焦中心区域的简单基线方法,在三维脑部扫描切片级分类任务中,性能可超越基于注意力和Transformer的MIL方法。我们证明该基线方法在胸腹部CT扫描切片级分类任务中同样优于现有MIL方法。受该基线启发,我们提出“标准引导”(Normal Guidance)正则化技术,促使学习到的注意力分布呈现钟形曲线。在包含超过400万张二维切片的三组医学影像数据集上,实验表明:我们的标准引导技术能使基于注意力和Transformer的MIL方法在切片级定位精度上显著超越当前最优水平,同时在全扫描分类任务中保持竞争力。