Objectives: The purpose is to apply a previously validated deep learning algorithm to a new thyroid nodule ultrasound image dataset and compare its performances with radiologists. Methods: Prior study presented an algorithm which is able to detect thyroid nodules and then make malignancy classifications with two ultrasound images. A multi-task deep convolutional neural network was trained from 1278 nodules and originally tested with 99 separate nodules. The results were comparable with that of radiologists. The algorithm was further tested with 378 nodules imaged with ultrasound machines from different manufacturers and product types than the training cases. Four experienced radiologists were requested to evaluate the nodules for comparison with deep learning. Results: The Area Under Curve (AUC) of the deep learning algorithm and four radiologists were calculated with parametric, binormal estimation. For the deep learning algorithm, the AUC was 0.69 (95% CI: 0.64 - 0.75). The AUC of radiologists were 0.63 (95% CI: 0.59 - 0.67), 0.66 (95% CI:0.61 - 0.71), 0.65 (95% CI: 0.60 - 0.70), and 0.63 (95%CI: 0.58 - 0.67). Conclusion: In the new testing dataset, the deep learning algorithm achieved similar performances with all four radiologists. The relative performance difference between the algorithm and the radiologists is not significantly affected by the difference of ultrasound scanner.
翻译:目标:本研究旨在将先前验证的深度学习算法应用于新的甲状腺结节超声图像数据集,并与放射科医师的诊断性能进行比较。方法:前期研究提出了一种利用两幅超声图像检测甲状腺结节并进行恶性分类的算法。该多任务深度卷积神经网络基于1278个结节进行训练,并初步使用99个独立结节进行测试,其结果与放射科医师相当。本研究进一步使用378个结节(其超声设备制造商和产品类型均不同于训练集)对算法进行测试,并邀请四位经验丰富的放射科医师评估结节,以与深度学习结果进行对比。结果:采用参数双正态估计法计算深度学习算法及四位放射科医师的曲线下面积(AUC)。深度学习算法的AUC为0.69(95%置信区间:0.64-0.75),四位放射科医师的AUC分别为0.63(95%置信区间:0.59-0.67)、0.66(95%置信区间:0.61-0.71)、0.65(95%置信区间:0.60-0.70)和0.63(95%置信区间:0.58-0.67)。结论:在新测试数据集中,深度学习算法取得了与所有四位放射科医师相当的性能。算法与放射科医师之间的相对性能差异未受超声扫描仪差异的显著影响。