We present Causal-Adapter, a modular framework that adapts frozen text-to-image diffusion backbones for counterfactual image generation. Our method enables causal interventions on target attributes, consistently propagating their effects to causal dependents without altering the core identity of the image. In contrast to prior approaches that rely on prompt engineering without explicit causal structure, Causal-Adapter leverages structural causal modeling augmented with two attribute regularization strategies: prompt-aligned injection, which aligns causal attributes with textual embeddings for precise semantic control, and a conditioned token contrastive loss to disentangle attribute factors and reduce spurious correlations. Causal-Adapter achieves state-of-the-art performance on both synthetic and real-world datasets, with up to 91% MAE reduction on Pendulum for accurate attribute control and 87% FID reduction on ADNI for high-fidelity MRI image generation. These results show that our approach enables robust, generalizable counterfactual editing with faithful attribute modification and strong identity preservation.
翻译:我们提出了因果适配器,一种模块化框架,用于适配冻结的文本到图像扩散主干网络以进行反事实图像生成。我们的方法能够对目标属性进行因果干预,将其效应一致地传播到因果依赖项,而不改变图像的核心身份。与先前依赖提示工程但缺乏明确因果结构的方法不同,因果适配器利用结构因果建模,并辅以两种属性正则化策略:提示对齐注入,它将因果属性与文本嵌入对齐以实现精确的语义控制;以及条件令牌对比损失,用于解耦属性因子并减少虚假相关性。因果适配器在合成和真实世界数据集上均实现了最先进的性能,在Pendulum数据集上,精确属性控制的平均绝对误差降低了高达91%;在ADNI数据集上,高保真MRI图像生成的FID分数降低了87%。这些结果表明,我们的方法能够实现鲁棒、可泛化的反事实编辑,同时确保忠实的属性修改和强大的身份保持。