Recent progress in image-to-video (I2V) diffusion models has significantly advanced the field of generative inbetweening, which aims to generate semantically plausible frames between two keyframes. In particular, inference-time sampling strategies, which leverage the generative priors of large-scale pre-trained I2V models without additional training, have become increasingly popular. However, existing inference-time sampling, either fusing forward and backward paths in parallel or alternating them sequentially, often suffers from temporal discontinuities and undesirable visual artifacts due to the misalignment between the two generated paths. This is because each path follows the motion prior induced by its own conditioning frame. In this work, we propose Motion Prior Distillation (MPD), a simple yet effective inference-time distillation technique that suppresses bidirectional mismatch by distilling the motion residual of the forward path into the backward path. Our method can deliberately avoid denoising the end-conditioned path which causes the ambiguity of the path, and yield more temporally coherent inbetweening results with the forward motion prior. We not only perform quantitative evaluations on standard benchmarks, but also conduct extensive user studies to demonstrate the effectiveness of our approach in practical scenarios.
翻译:近期图像到视频(I2V)扩散模型的进展显著推动了生成式中间帧插值领域的发展,该技术旨在两个关键帧之间生成语义合理的过渡帧。特别是无需额外训练即可利用大规模预训练I2V模型生成先验的推理时采样策略日益受到关注。然而,现有推理时采样方法——无论是并行融合前向与反向路径,还是顺序交替执行——常因两条生成路径间的错位而导致时序不连续与不良视觉伪影。这是由于每条路径都遵循其自身条件帧所诱导的运动先验。本研究提出运动先验蒸馏(MPD),这是一种简单而有效的推理时蒸馏技术,通过将前向路径的运动残差蒸馏至反向路径来抑制双向失配。我们的方法能刻意避免对引起路径模糊性的末端条件路径进行去噪,并借助前向运动先验产生更具时序一致性的中间帧插值结果。我们不仅在标准基准上进行了定量评估,还通过广泛的用户研究验证了该方法在实际场景中的有效性。