The impact of deep learning aid on the workload and interpretation accuracy of radiologists on chest computed tomography: a cross-over reader study

Anvar Kurmukov,Valeria Chernina,Regina Gareeva,Maria Dugova,Ekaterina Petrash,Olga Aleshina,Maxim Pisov,Boris Shirokikh,Valentin Samokhin,Vladislav Proskurov,Stanislav Shimovolos,Maria Basova,Mikhail Goncahrov,Eugenia Soboleva,Maria Donskova,Farukh Yaushev,Alexey Shevtsov,Alexey Zakharov,Talgat Saparov,Victor Gombolevskiy,Mikhail Belyaev

from arxiv, 17 pages, 6 figures, 8 tables

Interpretation of chest computed tomography (CT) is time-consuming. Previous studies have measured the time-saving effect of using a deep-learning-based aid (DLA) for CT interpretation. We evaluated the joint impact of a multi-pathology DLA on the time and accuracy of radiologists' reading. 40 radiologists were randomly split into three experimental arms: control (10), who interpret studies without assistance; informed group (10), who were briefed about DLA pathologies, but performed readings without it; and the experimental group (20), who interpreted half studies with DLA, and half without. Every arm used the same 200 CT studies retrospectively collected from BIMCV-COVID19 dataset; each radiologist provided readings for 20 CT studies. We compared interpretation time, and accuracy of participants diagnostic report with respect to 12 pathological findings. Mean reading time per study was 15.6 minutes [SD 8.5] in the control arm, 13.2 minutes [SD 8.7] in the informed arm, 14.4 [SD 10.3] in the experimental arm without DLA, and 11.4 minutes [SD 7.8] in the experimental arm with DLA. Mean sensitivity and specificity were 41.5 [SD 30.4], 86.8 [SD 28.3] in the control arm; 53.5 [SD 22.7], 92.3 [SD 9.4] in the informed non-assisted arm; 63.2 [SD 16.4], 92.3 [SD 8.2] in the experimental arm without DLA; and 91.6 [SD 7.2], 89.9 [SD 6.0] in the experimental arm with DLA. DLA speed up interpretation time per study by 2.9 minutes (CI95 [1.7, 4.3], p<0.0005), increased sensitivity by 28.4 (CI95 [23.4, 33.4], p<0.0005), and decreased specificity by 2.4 (CI95 [0.6, 4.3], p=0.13). Of 20 radiologists in the experimental arm, 16 have improved reading time and sensitivity, two improved their time with a marginal drop in sensitivity, and two participants improved sensitivity with increased time. Overall, DLA introduction decreased reading time by 20.6%.

翻译：胸部计算机断层扫描（CT）的判读耗时较长。既往研究已测量了使用基于深度学习的辅助工具（DLA）解读CT的节时效果。我们评估了多病理类型DLA对放射科医师判读时间和准确性的综合影响。40名放射科医师被随机分为三组：对照组（10人），无辅助解读研究；告知组（10人），被告知DLA可识别的病理类型但未使用该工具进行判读；实验组（20人），其中一半研究使用DLA判读，另一半不使用。所有组均使用从BIMCV-COVID19数据集中回顾性收集的同一200例CT研究；每位放射科医师对20例CT研究进行判读。我们比较了判读时间以及参与者诊断报告在12种病理发现上的准确性。对照组每项研究的平均判读时间为15.6分钟[标准差8.5]，告知组为13.2分钟[标准差8.7]，未使用DLA的实验组为14.4分钟[标准差10.3]，使用DLA的实验组为11.4分钟[标准差7.8]。对照组的平均敏感性和特异性分别为41.5[标准差30.4]和86.8[标准差28.3]；告知组（未辅助）分别为53.5[标准差22.7]和92.3[标准差9.4]；未使用DLA的实验组分别为63.2[标准差16.4]和92.3[标准差8.2]；使用DLA的实验组分别为91.6[标准差7.2]和89.9[标准差6.0]。DLA使每项研究的判读时间加快2.9分钟（95%置信区间[1.7, 4.3]，p<0.0005），敏感性提高28.4（95%置信区间[23.4, 33.4]，p<0.0005），特异性下降2.4（95%置信区间[0.6, 4.3]，p=0.13）。实验组20名放射科医师中，16人判读时间和敏感性均改善，2人时间缩短但敏感性略有下降，2人敏感性提高但时间增加。总体而言，引入DLA使判读时间减少20.6%。