Controllability plays a crucial role in video generation since it allows users to create desired content. However, existing models largely overlooked the precise control of camera pose that serves as a cinematic language to express deeper narrative nuances. To alleviate this issue, we introduce CameraCtrl, enabling accurate camera pose control for text-to-video(T2V) models. After precisely parameterizing the camera trajectory, a plug-and-play camera module is then trained on a T2V model, leaving others untouched. Additionally, a comprehensive study on the effect of various datasets is also conducted, suggesting that videos with diverse camera distribution and similar appearances indeed enhance controllability and generalization. Experimental results demonstrate the effectiveness of CameraCtrl in achieving precise and domain-adaptive camera control, marking a step forward in the pursuit of dynamic and customized video storytelling from textual and camera pose inputs. Our project website is at: https://hehao13.github.io/projects-CameraCtrl/.
翻译:可控性是视频生成中的关键能力,因为其能使用户创作期望的内容。然而,现有模型在很大程度上忽略了对相机姿态的精确控制——这种控制作为电影语言,可表达更深层的叙事细微差别。为解决该问题,我们提出CameraCtrl,为文本到视频(T2V)模型实现了精确的相机姿态控制。在对相机轨迹进行精确参数化后,一个即插即用的相机模块被训练在T2V模型上,同时保持其他部分不变。此外,我们还对不同数据集的影响开展了全面研究,结果表明具有多样化相机分布和相似外观的视频确实能增强可控性与泛化能力。实验结果表明,CameraCtrl能够实现精确且领域自适应的相机控制,这标志着我们从文本和相机姿态输入出发,向动态化和个性化视频叙事迈进了一步。我们的项目网站位于:https://hehao13.github.io/projects-CameraCtrl/。