This paper introduces CameraCtrl II, a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model. Previous camera-conditioned video generative models suffer from diminished video dynamics and limited range of viewpoints when generating videos with large camera movement. We take an approach that progressively expands the generation of dynamic scenes -- first enhancing dynamic content within individual video clip, then extending this capability to create seamless explorations across broad viewpoint ranges. Specifically, we construct a dataset featuring a large degree of dynamics with camera parameter annotations for training while designing a lightweight camera injection module and training scheme to preserve dynamics of the pretrained models. Building on these improved single-clip techniques, we enable extended scene exploration by allowing users to iteratively specify camera trajectories for generating coherent video sequences. Experiments across diverse scenarios demonstrate that CameraCtrl Ii enables camera-controlled dynamic scene synthesis with substantially wider spatial exploration than previous approaches.
翻译:本文介绍CameraCtrl II框架,该框架通过相机控制视频扩散模型实现大规模动态场景探索。现有相机条件视频生成模型在生成大范围相机运动视频时存在动态性衰减与视角范围受限的问题。我们采用渐进式动态场景生成方法——首先增强单个视频片段内的动态内容,随后将该能力扩展至实现大范围视角下的无缝场景探索。具体而言,我们构建了具有高动态幅度且附带相机参数标注的数据集用于训练,同时设计轻量级相机注入模块与训练方案以保持预训练模型的动态特性。基于这些改进的单片段生成技术,我们允许用户通过迭代指定相机轨迹来生成连贯视频序列,从而实现扩展场景探索。多场景实验表明,CameraCtrl II能够实现相机控制的动态场景合成,其空间探索范围较现有方法有显著提升。