SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time

We present SpaceTimePilot, a video diffusion model that disentangles space and time for controllable generative rendering. Given a monocular video, SpaceTimePilot can independently alter the camera viewpoint and the motion sequence within the generative process, re-rendering the scene for continuous and arbitrary exploration across space and time. To achieve this, we introduce an effective animation time-embedding mechanism in the diffusion process, allowing explicit control of the output video's motion sequence with respect to that of the source video. As no datasets provide paired videos of the same dynamic scene with continuous temporal variations, we propose a simple yet effective temporal-warping training scheme that repurposes existing multi-view datasets to mimic temporal differences. This strategy effectively supervises the model to learn temporal control and achieve robust space-time disentanglement. To further enhance the precision of dual control, we introduce two additional components: an improved camera-conditioning mechanism that allows altering the camera from the first frame, and CamxTime, the first synthetic space-and-time full-coverage rendering dataset that provides fully free space-time video trajectories within a scene. Joint training on the temporal-warping scheme and the CamxTime dataset yields more precise temporal control. We evaluate SpaceTimePilot on both real-world and synthetic data, demonstrating clear space-time disentanglement and strong results compared to prior work. Project page: https://zheninghuang.github.io/Space-Time-Pilot/ Code: https://github.com/ZheningHuang/spacetimepilot

翻译：我们提出时空导航者（SpaceTimePilot），一种用于可控生成式渲染的视频扩散模型，其实现了空间与时间的解耦。给定单目视频，时空导航者能够在生成过程中独立改变摄像机视点和运动序列，从而重新渲染场景，实现跨时空的连续任意探索。为实现这一目标，我们在扩散过程中引入了一种高效的动画时间嵌入机制，允许对输出视频相对于源视频的运动序列进行显式控制。由于现有数据集均未提供具有连续时间变化的同一动态场景的配对视频，我们提出了一种简单而有效的时间扭曲训练方案，该方案通过重新利用现有的多视角数据集来模拟时间差异。该策略有效地监督模型学习时间控制，并实现稳健的时空解耦。为进一步提升双重控制的精度，我们引入了两个附加组件：一种改进的摄像机条件机制，允许从第一帧开始改变摄像机；以及CamxTime——首个合成的时空全覆盖渲染数据集，该数据集提供了场景内完全自由的时空视频轨迹。在时间扭曲方案与CamxTime数据集上的联合训练实现了更精确的时间控制。我们在真实世界和合成数据上评估了时空导航者，结果表明相较于先前工作，其展现出清晰的时空解耦能力并取得了优异的结果。项目页面：https://zheninghuang.github.io/Space-Time-Pilot/ 代码：https://github.com/ZheningHuang/spacetimepilot