As deep learning (DL) continues to demonstrate its ability in radiological tasks, it is critical that we optimize clinical DL solutions to include safety. One of the principal concerns in the clinical adoption of DL tools is trust. This study aims to apply conformal prediction as a step toward trustworthiness for DL in radiology. This is a retrospective study of 491 non-contrast head CTs from the CQ500 dataset, in which three senior radiologists annotated slices containing intracranial hemorrhage (ICH). The dataset was split into definite and challenging subsets, where challenging images were defined to those in which there was disagreement among readers. A DL model was trained on 146 patients (10,815 slices) from the definite data (training dataset) to perform ICH localization and classification for five classes of ICH. To develop an uncertainty-aware DL model, 1,546 cases of the definite data (calibration dataset) was used for Mondrian conformal prediction (MCP). The uncertainty-aware DL model was tested on 8,401 definite and challenging cases to assess its ability to identify challenging cases. After the MCP procedure, the model achieved an F1 score of 0.920 for ICH classification on the test dataset. Additionally, it correctly identified 6,837 of the 6,856 total challenging cases as challenging (99.7% accuracy). It did not incorrectly label any definite cases as challenging. The uncertainty-aware ICH detector performs on par with state-of-the-art models. MCP's performance in detecting challenging cases demonstrates that it is useful in automated ICH detection and promising for trustworthiness in radiological DL.
翻译:随着深度学习在放射学任务中持续展现其能力,优化临床深度学习解决方案以纳入安全性至关重要。临床采用深度学习工具的主要担忧之一是信任问题。本研究旨在应用共形预测作为迈向放射学深度学习可信性的一步。这是一项回顾性研究,基于CQ500数据集的491例非增强头部CT扫描,由三位资深放射科医生对含有颅内出血的切片进行标注。数据集被分为明确子集和挑战性子集,其中挑战性图像定义为读者之间存在分歧的样本。基于明确数据(训练数据集)中146名患者(10,815张切片)训练深度学习模型,执行五类颅内出血的定位与分类。为开发不确定性感知的深度学习模型,利用明确数据中的1,546例样本(校准数据集)进行蒙德里安共形预测。该不确定性感知模型在8,401例明确和挑战性病例上进行测试,以评估其识别挑战性病例的能力。经共形预测流程后,模型在测试数据集上的颅内出血分类F1分数达到0.920。此外,它正确识别了6,856个挑战性病例中的6,837个(准确率99.7%),且未将任何明确病例错误标记为挑战性病例。该不确定性感知颅内出血检测器性能与当前最优模型相当。共形预测在检测挑战性病例中的表现证明其适用于自动化颅内出血检测,并为放射学深度学习的可信性提供了前景。