Weakly Supervised Learning with Automated Labels from Radiology Reports for Glioma Change Detection

Gliomas are the most frequent primary brain tumors in adults. Glioma change detection aims at finding the relevant parts of the image that change over time. Although Deep Learning (DL) shows promising performances in similar change detection tasks, the creation of large annotated datasets represents a major bottleneck for supervised DL applications in radiology. To overcome this, we propose a combined use of weak labels (imprecise, but fast-to-create annotations) and Transfer Learning (TL). Specifically, we explore inductive TL, where source and target domains are identical, but tasks are different due to a label shift: our target labels are created manually by three radiologists, whereas our source weak labels are generated automatically from radiology reports via NLP. We frame knowledge transfer as hyperparameter optimization, thus avoiding heuristic choices that are frequent in related works. We investigate the relationship between model size and TL, comparing a low-capacity VGG with a higher-capacity ResNeXt model. We evaluate our models on 1693 T2-weighted magnetic resonance imaging difference maps created from 183 patients, by classifying them into stable or unstable according to tumor evolution. The weak labels extracted from radiology reports allowed us to increase dataset size more than 3-fold, and improve VGG classification results from 75% to 82% AUC. Mixed training from scratch led to higher performance than fine-tuning or feature extraction. To assess generalizability, we ran inference on an open dataset (BraTS-2015: 15 patients, 51 difference maps), reaching up to 76% AUC. Overall, results suggest that medical imaging problems may benefit from smaller models and different TL strategies with respect to computer vision datasets, and that report-generated weak labels are effective in improving model performances. Code, in-house dataset and BraTS labels are released.

翻译：胶质瘤是成人最常见原发性脑肿瘤。胶质瘤变化检测旨在识别随时间推移发生变化的影像相关区域。尽管深度学习在类似变化检测任务中展现出良好性能，但大规模标注数据集的创建是放射学监督学习应用的主要瓶颈。为克服这一难题，我们提出结合使用弱标签（不精确但可快速生成的标注）与迁移学习策略。具体而言，我们探索归纳式迁移学习：源域与目标域相同，但由于标签偏移导致任务不同——目标标签由三位放射科医师人工标注，而源弱标签则通过自然语言处理从放射学报告中自动生成。我们将知识迁移框架化为超参数优化问题，从而避免相关研究中常见的启发式选择。我们探究模型规模与迁移学习的关系，对比了低容量VGG模型与高容量ResNeXt模型。基于183例患者生成的1693张T2加权磁共振成像差值图，我们依据肿瘤演变状态将其分为稳定与不稳定两类进行模型评估。从放射学报告中提取的弱标签使数据集规模扩大3倍以上，并将VGG分类结果从75% AUC提升至82% AUC。从头开始的混合训练效果优于微调或特征提取策略。为评估泛化能力，我们在公开数据集（BraTS-2015：15例患者，51张差值图）上进行推断，最高达到76% AUC。总体结果表明，相较于计算机视觉数据集，医学影像问题可能更受益于较小模型与差异化迁移学习策略，且报告生成的弱标签能有效提升模型性能。本文公开代码、内部数据集及BraTS标注信息。