While deep learning models have achieved remarkable success across a range of medical image analysis tasks, deployment of these models in real clinical contexts requires that they be robust to variability in the acquired images. While many methods apply predefined transformations to augment the training data to enhance test-time robustness, these transformations may not ensure the model's robustness to the diverse variability seen in patient images. In this paper, we introduce a novel three-stage approach based on transformers coupled with conditional diffusion models, with the goal of improving model robustness to the kinds of imaging variability commonly encountered in practice without the need for pre-determined data augmentation strategies. To this end, multiple image encoders first learn hierarchical feature representations to build discriminative latent spaces. Next, a reverse diffusion process, guided by the latent code, acts on an informative prior and proposes prediction candidates in a generative manner. Finally, several prediction candidates are aggregated in a bi-level aggregation protocol to produce the final output. Through extensive experiments on medical imaging benchmark datasets, we show that our method improves upon state-of-the-art methods in terms of robustness and confidence calibration. Additionally, we introduce a strategy to quantify the prediction uncertainty at the instance level, increasing their trustworthiness to clinicians using them in clinical practice.
翻译:尽管深度学习模型在多项医学图像分析任务中取得了显著成功,但将其部署到真实临床环境中时,必须确保模型对采集图像的可变性具有鲁棒性。现有方法多采用预定义变换增强训练数据以提升测试时的鲁棒性,但这些变换可能无法确保模型对患者图像中多样可变性的适应能力。本文提出一种基于Transformer与条件扩散模型的三阶段新方法,旨在无需预先设定数据增强策略的情况下,提升模型对实践中常见成像可变性的鲁棒性。为此,多个图像编码器首先学习层次化特征表示以构建判别性潜在空间;随后,由潜码引导的反向扩散过程作用于信息性先验,以生成方式提出预测候选;最后,通过双层聚合协议整合多个预测候选生成最终输出。在医学影像基准数据集上的大量实验表明,本方法在鲁棒性和置信度校准方面均优于现有最优方法。此外,我们引入了一种实例级预测不确定性量化策略,进一步提升了临床医生使用模型时的可信度。