Radiologists are in short supply globally, and deep learning models offer a promising solution to address this shortage as part of clinical decision-support systems. However, training such models often requires expensive and time-consuming manual labeling of large datasets. Automatic label extraction from radiology reports can reduce the time required to obtain labeled datasets, but this task is challenging due to semantically similar words and missing annotated data. In this work, we explore the potential of weak supervision of a deep learning-based label prediction model, using a rule-based labeler. We propose a deep learning-based CheXpert label prediction model, pre-trained on reports labeled by a rule-based German CheXpert model and fine-tuned on a small dataset of manually labeled reports. Our results demonstrate the effectiveness of our approach, which significantly outperformed the rule-based model on all three tasks. Our findings highlight the benefits of employing deep learning-based models even in scenarios with sparse data and the use of the rule-based labeler as a tool for weak supervision.
翻译:全球放射科医生短缺,而深度学习模型作为临床决策支持系统的一部分,为解决这一短缺提供了有前景的方案。然而,训练此类模型通常需要昂贵且耗时的人工标注大规模数据集。从放射报告中自动提取标签可以减少获取标注数据集所需的时间,但这一任务因语义相似的词汇和缺乏标注数据而具有挑战性。在本研究中,我们探索了利用基于规则的标注器对基于深度学习的标签预测模型进行弱监督的潜力。我们提出了一种基于深度学习的CheXpert标签预测模型,该模型首先通过由基于规则的德语CheXpert模型标注的报告进行预训练,随后在一小部分人工标注的报告数据集上进行微调。我们的结果证明了该方法的有效性,在所有三项任务上均显著优于基于规则的模型。我们的研究结果凸显了即使在数据稀疏的情况下,使用深度学习模型以及将基于规则的标注器作为弱监督工具的优势。