Deep learning is the state-of-the-art for medical imaging tasks, but requires large, labeled datasets. For risk prediction, large datasets are rare since they require both imaging and follow-up (e.g., diagnosis codes). However, the release of publicly available imaging data with diagnostic labels presents an opportunity for self and semi-supervised approaches to improve label efficiency for risk prediction. Though several studies have compared self-supervised approaches in natural image classification, object detection, and medical image interpretation, there is limited data on which approaches learn robust representations for risk prediction. We present a comparison of semi- and self-supervised learning to predict mortality risk using chest x-ray images. We find that a semi-supervised autoencoder outperforms contrastive and transfer learning in internal and external validation.
翻译:深度学习是医学影像任务的最先进技术,但需要大规模标注数据集。对于风险预测而言,大规模数据集较为罕见,因为需要同时具备影像数据和随访信息(如诊断代码)。然而,公开可获取的带有诊断标签的影像数据为自监督和半监督方法提供了提升风险预测标签效率的机会。尽管已有研究在自然图像分类、目标检测和医学影像解读领域比较了自监督方法,但关于哪些方法能学习到稳健的风险预测表征的数据仍然有限。我们提出了一种利用胸部X光影像预测死亡风险的半监督与自监督学习比较研究。研究发现,在内部验证和外部验证中,半监督自编码器均优于对比学习和迁移学习方法。