Cross-modal image translation remains brittle and inefficient. Standard diffusion approaches often rely on a single, global linear transfer between domains. We find that this shortcut forces the sampler to traverse off-manifold, high-cost regions, inflating the correction burden and inviting semantic drift. We refer to this shared failure mode as fixed-schedule domain transfer. In this paper, we embed domain-shift dynamics directly into the generative process. Our model predicts a spatially varying mixing field at every reverse step and injects an explicit, target-consistent restoration term into the drift. This in-step guidance keeps large updates on-manifold and shifts the model's role from global alignment to local residual correction. We provide a continuous-time formulation with an exact solution form and derive a practical first-order sampler that preserves marginal consistency. Empirically, across translation tasks in medical imaging, remote sensing, and electroluminescence semantic mapping, our framework improves structural fidelity and semantic consistency while converging in fewer denoising steps.
翻译:跨模态图像翻译仍然存在脆弱性和低效性问题。标准的扩散方法通常依赖于域间单一的全局线性转换。我们发现这种捷径迫使采样器穿越流形外的高成本区域,增加了校正负担并引发语义漂移。我们将这种共有的失效模式称为固定调度域转换。本文中,我们将领域偏移动态直接嵌入生成过程。我们的模型在每次反向步骤中预测空间变化的混合场,并向漂移项注入显式的目标一致性恢复项。这种步内引导使大幅更新保持在流形上,并将模型的作用从全局对齐转变为局部残差校正。我们提出了具有精确解形式的连续时间公式,并推导出保持边缘一致性的实用一阶采样器。在医学影像、遥感和电致发光语义映射等翻译任务的实证实验中,我们的框架在更少去噪步骤内收敛的同时,提高了结构保真度和语义一致性。