Developing reliable and generalizable deep learning systems for medical imaging faces significant obstacles due to spurious correlations, data imbalances, and limited text annotations in datasets. Addressing these challenges requires architectures that are robust to the unique complexities posed by medical imaging data. Rapid advancements in vision-language foundation models within the natural image domain prompt the question of how they can be adapted for medical imaging tasks. In this work, we present PRISM, a framework that leverages foundation models to generate high-resolution, language-guided medical image counterfactuals using Stable Diffusion. Our approach demonstrates unprecedented precision in selectively modifying spurious correlations (the medical devices) and disease features, enabling the removal and addition of specific attributes while preserving other image characteristics. Through extensive evaluation, we show how PRISM advances counterfactual generation and enables the development of more robust downstream classifiers for clinically deployable solutions. To facilitate broader adoption and research, we make our code publicly available at https://github.com/Amarkr1/PRISM.
翻译:开发可靠且可泛化的医学影像深度学习系统面临重大挑战,主要源于数据集中的伪相关性、数据不平衡以及文本标注稀缺。应对这些挑战需要能够适应医学影像数据独特复杂性的鲁棒架构。自然图像领域中视觉-语言基础模型的快速发展促使我们思考如何将其适配于医学影像任务。本研究提出PRISM框架,该框架利用基础模型通过Stable Diffusion生成高分辨率、语言引导的医学影像反事实样本。我们的方法在选择性修改伪相关性(医疗设备)与疾病特征方面展现出前所未有的精确度,能够实现特定属性的移除与添加,同时保持其他图像特征不变。通过广泛评估,我们证明PRISM不仅推进了反事实生成技术,更能促进开发更鲁棒的下游分类器以构建临床可部署解决方案。为促进更广泛的应用与研究,我们已在https://github.com/Amarkr1/PRISM公开代码。