Diffusion models exhibit remarkable generative ability, yet achieving smooth and semantically consistent image morphing remains a challenge. Existing approaches often yield abrupt transitions or over-saturated appearances due to the lack of adaptive structural and semantic alignments. We propose CHIMERA, a zero-shot diffusion-based framework that formulates morphing as a cached inversion-guided denoising process. To handle large semantic and appearance disparities, we propose Adaptive Cache Injection and Semantic Anchor Prompting. Adaptive Cache Injection (ACI) caches down, mid, and up blocks features from both inputs during DDIM inversion and re-injects them adaptively during denoising, enabling spatial and semantic alignment in depth- and time-adaptive manners and enabling natural feature fusion and smooth transitions. Semantic Anchor Prompting (SAP) leverages a vision-language model to generate a shared anchor prompt that serves as a semantic anchor, bridging dissimilar inputs and guiding the denoising process toward coherent results. Finally, we introduce the Global-Local Consistency Score (GLCS), a morphing-oriented metric that simultaneously evaluates the global harmonization of the two inputs and the smoothness of the local morphing transition. Extensive experiments and user studies show that CHIMERA achieves smoother and more semantically aligned transitions than existing methods, establishing a new state of the art in image morphing. The code and project page will be publicly released.
翻译:扩散模型展现出卓越的生成能力,然而实现平滑且语义一致的图像渐变仍具挑战。现有方法常因缺乏自适应结构与语义对齐而导致突变过渡或过饱和外观。本文提出CHIMERA,一种基于扩散的零样本框架,将渐变建模为缓存反转引导的去噪过程。为处理显著的语义与外观差异,我们提出自适应缓存注入与语义锚点提示。自适应缓存注入(ACI)在DDIM反转过程中缓存来自双输入的下层、中层及上层块特征,并在去噪阶段自适应重注入,以深度与时序自适应方式实现空间与语义对齐,从而实现自然特征融合与平滑过渡。语义锚点提示(SAP)利用视觉-语言模型生成共享锚点提示作为语义锚,桥接异质输入并引导去噪过程生成连贯结果。最后,我们提出全局-局部一致性评分(GLCS),这是一种面向渐变任务的评估指标,可同步评估双输入的整体协调性与局部渐变过渡的平滑度。大量实验与用户研究表明,CHIMERA相比现有方法能实现更平滑且语义对齐的过渡,确立了图像渐变领域的最新性能标杆。代码与项目页面将公开发布。