We present Vista4D, a robust and flexible video reshooting framework that grounds the input video and target cameras in a 4D point cloud. Specifically, given an input video, our method re-synthesizes the scene with the same dynamics from a different camera trajectory and viewpoint. Existing video reshooting methods often struggle with depth estimation artifacts of real-world dynamic videos, while also failing to preserve content appearance and failing to maintain precise camera control for challenging new trajectories. We build a 4D-grounded point cloud representation with static pixel segmentation and 4D reconstruction to explicitly preserve seen content and provide rich camera signals, and we train with reconstructed multiview dynamic data for robustness against point cloud artifacts during real-world inference. Our results demonstrate improved 4D consistency, camera control, and visual quality compared to state-of-the-art baselines under a variety of videos and camera paths. Moreover, our method generalizes to real-world applications such as dynamic scene expansion and 4D scene recomposition. See our project page for results, code, and models: https://eyeline-labs.github.io/Vista4D
翻译:我们提出Vista4D,一个鲁棒且灵活的视频重摄框架,该框架将输入视频与目标相机位姿嵌入4D点云进行建模。具体而言,给定输入视频,我们的方法能够从不同的相机轨迹与视角重新合成具有相同动态内容的场景。现有视频重摄方法在处理真实世界动态视频的深度估计伪影时存在困难,同时无法保持内容外观的一致性,且对具有挑战性的新轨迹难以实现精确的相机控制。我们通过静态像素分割与4D重建构建基于4D点云的表征,显式保留已观测内容并提供丰富的相机信号;同时利用重建的多视角动态数据进行训练,以增强对真实世界推理中点云伪影的鲁棒性。实验结果表明,在多种视频与相机路径场景下,我们的方法相比最先进基线方法在4D一致性、相机控制与视觉质量方面均有提升。此外,本方法可泛化至动态场景扩展与4D场景重组等真实应用。结果、代码与模型详见项目页面:https://eyeline-labs.github.io/Vista4D