Image-based diagnostic decision support systems (DDSS) utilizing deep learning have the potential to optimize clinical workflows. However, developing DDSS requires extensive datasets with expert annotations and is therefore costly. Leveraging report contents from radiological data bases with Natural Language Processing to annotate the corresponding image data promises to replace labor-intensive manual annotation. As mining "real world" databases can introduce label noise, noise-robust training losses are of great interest. However, current noise-robust losses do not consider noise estimations that can for example be derived based on the performance of the automatic label generator used. In this study, we expand the noise-robust Deep Abstaining Classifier (DAC) loss to an Informed Deep Abstaining Classifier (IDAC) loss by incorporating noise level estimations during training. Our findings demonstrate that IDAC enhances the noise robustness compared to DAC and several state-of-the-art loss functions. The results are obtained on various simulated noise levels using a public chest X-ray data set. These findings are reproduced on an in-house noisy data set, where labels were extracted from the clinical systems of the University Hospital Bonn by a text-based transformer. The IDAC can therefore be a valuable tool for researchers, companies or clinics aiming to develop accurate and reliable DDSS from routine clinical data.
翻译:利用深度学习的基于图像的诊断决策支持系统(DDSS)具有优化临床工作流程的潜力。然而,开发DDSS需要大量专家标注的数据集,因此成本高昂。利用自然语言处理技术从放射学数据库中提取报告内容来标注相应的图像数据,有望替代劳动密集型的人工标注。由于挖掘"真实世界"数据库可能引入标签噪声,噪声鲁棒性训练损失函数备受关注。然而,当前的噪声鲁棒损失函数未考虑噪声估计,例如基于所用自动标签生成器的性能推导出的估计。在本研究中,我们通过将噪声水平估计纳入训练过程,将噪声鲁棒的深度弃权分类器(DAC)损失扩展为基于信息的深度弃权分类器(IDAC)损失。我们的研究结果表明,与DAC和多种先进损失函数相比,IDAC增强了噪声鲁棒性。这些结果是在使用公共胸部X射线数据集模拟的不同噪声水平上获得的,并在一个内部噪声数据集上得到复现,该数据集的标签是通过基于文本的transformer从波恩大学医院临床系统中提取的。因此,IDAC可成为研究人员、企业或诊所从常规临床数据开发准确可靠DDSS的有价值工具。