Adversarial diffusion and diffusion-inversion methods have advanced unpaired image-to-image translation, but each faces key limitations. Adversarial approaches require target-domain adversarial loss during training, which can limit generalization to unseen data, while diffusion-inversion methods often produce low-fidelity translations due to imperfect inversion into noise-latent representations. In this work, we propose the Self-Supervised Semantic Bridge (SSB), a versatile framework that integrates external semantic priors into diffusion bridge models to enable spatially faithful translation without cross-domain supervision. Our key idea is to leverage self-supervised visual encoders to learn representations that are invariant to appearance changes but capture geometric structure, forming a shared latent space that conditions the diffusion bridges. Extensive experiments show that SSB outperforms strong prior methods for challenging medical image synthesis in both in-domain and out-of-domain settings, and extends easily to high-quality text-guided editing.
翻译:对抗性扩散与扩散反转方法虽已推动了无配对图像到图像翻译的发展,但各自存在关键局限。对抗性方法在训练中需要目标域的对抗损失,这可能限制其对未见数据的泛化能力;而扩散反转方法由于向噪声潜在表示的反转不完善,常产生保真度较低的翻译结果。本文提出自监督语义桥(SSB),这是一个通用框架,通过将外部语义先验整合到扩散桥模型中,实现在无需跨域监督的情况下进行空间保真的翻译。我们的核心思想是利用自监督视觉编码器学习对外观变化具有不变性但能捕捉几何结构的表示,从而构建一个共享潜在空间以约束扩散桥。大量实验表明,SSB在领域内及领域外的挑战性医学图像合成任务中均优于现有先进方法,并可轻松扩展至高质量的文本引导编辑任务。