While deep learning models have achieved remarkable success across a range of medical image analysis tasks, deployment of these models in real clinical contexts requires that they be robust to variability in the acquired images. While many methods apply predefined transformations to augment the training data to enhance test-time robustness, these transformations may not ensure the model's robustness to the diverse variability seen in patient images. In this paper, we introduce a novel three-stage approach based on transformers coupled with conditional diffusion models, with the goal of improving model robustness to the kinds of imaging variability commonly encountered in practice without the need for pre-determined data augmentation strategies. To this end, multiple image encoders first learn hierarchical feature representations to build discriminative latent spaces. Next, a reverse diffusion process, guided by the latent code, acts on an informative prior and proposes prediction candidates in a generative manner. Finally, several prediction candidates are aggregated in a bi-level aggregation protocol to produce the final output. Through extensive experiments on medical imaging benchmark datasets, we show that our method improves upon state-of-the-art methods in terms of robustness and confidence calibration. Additionally, we introduce a strategy to quantify the prediction uncertainty at the instance level, increasing their trustworthiness to clinicians using them in clinical practice.
翻译:尽管深度学习模型在多项医学图像分析任务中取得了显著成功,但将这些模型部署到真实临床场景中时,要求其对采集图像的变异性具有鲁棒性。现有方法多通过预定义变换增强训练数据以提升测试时的鲁棒性,但这些变换可能无法确保模型对患者图像中多样变异的适应性。本文提出一种基于Transformer与条件扩散模型的三阶段新方法,旨在无需预定义数据增强策略的情况下,提升模型对临床常见成像变异类型的鲁棒性。具体而言,首先通过多图像编码器学习层次化特征表征以构建判别性潜在空间;其次,基于潜码引导的逆向扩散过程作用于信息先验,以生成式方式提出预测候选;最后,通过双层聚合协议整合多个预测候选以生成最终输出。在医学影像基准数据集上的大量实验表明,本方法在鲁棒性与置信度校准方面均优于现有最优方法。此外,我们引入一种实例级预测不确定性量化策略,增强临床实践中使用该模型的临床医生对其可信赖度。