Video frame interpolation aims to synthesize realistic intermediate frames between given endpoints while adhering to specific motion semantics. While recent generative models have improved visual fidelity, they predominantly operate in a unidirectional manner, lacking mechanisms to self-verify temporal consistency. This often leads to motion drift, directional ambiguity, and boundary misalignment, especially in long-range sequences. Inspired by the principle of temporal cycle-consistency in self-supervised learning, we propose a novel bidirectional framework that enforces symmetry between forward and backward generation trajectories. Our approach introduces learnable directional tokens to explicitly condition a shared backbone on temporal orientation, enabling the model to jointly optimize forward synthesis and backward reconstruction within a single unified architecture. This cycle-consistent supervision acts as a powerful regularizer, ensuring that generated motion paths are logically reversible. Furthermore, we employ a curriculum learning strategy that progressively trains the model from short to long sequences, stabilizing dynamics across varying durations. Crucially, our cyclic constraints are applied only during training; inference requires a single forward pass, maintaining the high efficiency of the base model. Extensive experiments show that our method achieves state-of-the-art performance in imaging quality, motion smoothness, and dynamic control on both 37-frame and 73-frame tasks, outperforming strong baselines while incurring no additional computational overhead.
翻译:视频帧插值旨在根据指定运动语义,在给定端点之间合成逼真的中间帧。尽管近期生成模型提升了视觉保真度,但它们主要采用单向方式运行,缺乏自验证时间一致性的机制。这常导致运动漂移、方向模糊和边界错位,尤其在长程序列中尤为突出。受自监督学习中时间循环一致性原理的启发,我们提出了一种新颖的双向框架,在前向与后向生成轨迹之间施加对称性约束。我们的方法引入可学习的方向标记,显式地以时间方向为条件约束共享主干网络,使模型能够在单一统一架构内联合优化前向合成与后向重建。这种循环一致性监督作为一种强大的正则化机制,确保生成的运动路径在逻辑上是可逆的。此外,我们采用课程学习策略,逐步训练模型从短序列到长序列,从而稳定不同持续时间的动态过程。关键的是,我们的循环约束仅在训练阶段应用;推理时只需单次前向传播,保持了基础模型的高效率。大量实验表明,我们的方法在37帧和73帧任务上,在成像质量、运动平滑度和动态控制方面均达到最先进性能,在无额外计算开销的同时优于强基线方法。