As deep learning has become the state-of-the-art for computer-assisted diagnosis, interpretability of the automatic decisions is crucial for clinical deployment. While various methods were proposed in this domain, visual attention maps of clinicians during radiological screening offer a unique asset to provide important insights and can potentially enhance the quality of computer-assisted diagnosis. With this paper, we introduce a novel deep-learning framework for joint disease diagnosis and prediction of corresponding visual saliency maps for chest X-ray scans. Specifically, we designed a novel dual-encoder multi-task UNet, which leverages both a DenseNet201 backbone and a Residual and Squeeze-and-Excitation block-based encoder to extract diverse features for saliency map prediction, and a multi-scale feature-fusion classifier to perform disease classification. To tackle the issue of asynchronous training schedules of individual tasks in multi-task learning, we proposed a multi-stage cooperative learning strategy, with contrastive learning for feature encoder pretraining to boost performance. Experiments show that our proposed method outperformed existing techniques for chest X-ray diagnosis and the quality of visual saliency map prediction.
翻译:随着深度学习成为计算机辅助诊断的主流技术,自动决策的可解释性对临床部署至关重要。尽管该领域已提出多种方法,但在放射学筛查过程中临床医生的视觉注意力图为理解决策过程提供了独特价值,并可能提升计算机辅助诊断的质量。本文提出了一种新型深度学习框架,用于联合实现胸部X光扫描的疾病诊断与相应视觉显著性图预测。具体而言,我们设计了一种新颖的双编码器多任务UNet结构,该结构利用基于DenseNet201的主干网络和基于残差与挤压激发模块的编码器提取多样性特征以预测显著性图,并通过多尺度特征融合分类器进行疾病分类。针对多任务学习中各任务训练进度不同步的问题,我们提出了多阶段协同学习策略,该策略使用对比学习预训练特征编码器以提升性能。实验表明,所提方法在胸部X光诊断和视觉显著性图预测质量上均优于现有技术。