We introduce bounded generation as a generalized task to control video generation to synthesize arbitrary camera and subject motion based only on a given start and end frame. Our objective is to fully leverage the inherent generalization capability of an image-to-video model without additional training or fine-tuning of the original model. This is achieved through the proposed new sampling strategy, which we call Time Reversal Fusion, that fuses the temporally forward and backward denoising paths conditioned on the start and end frame, respectively. The fused path results in a video that smoothly connects the two frames, generating inbetweening of faithful subject motion, novel views of static scenes, and seamless video looping when the two bounding frames are identical. We curate a diverse evaluation dataset of image pairs and compare against the closest existing methods. We find that Time Reversal Fusion outperforms related work on all subtasks, exhibiting the ability to generate complex motions and 3D-consistent views guided by bounded frames. See project page at https://time-reversal.github.io.
翻译:我们提出了有界生成任务,这是一种基于给定起始帧和结束帧控制视频生成以合成任意相机和主体运动的广义任务。我们的目标是充分利用图像到视频模型的固有泛化能力,无需额外训练或微调原始模型。这通过提出的新采样策略实现,即时间反转融合,该策略融合了分别以起始帧和结束帧为条件的时序前向和后向去噪路径。融合路径生成的视频能平滑连接两帧,产生忠实的主体运动插值、静态场景的新视角,以及在两端边界帧相同时实现无缝视频循环。我们构建了一个包含不同图像对的评估数据集,并与现有最优方法进行了比较。结果表明,时间反转融合在所有子任务中均优于相关工作,展现出通过有界帧引导生成复杂运动和三维一致视图的能力。详见项目页面:https://time-reversal.github.io。