Camera redirection aims to replay a dynamic scene from a single monocular video under a user-specified camera trajectory. However, large-angle redirection is inherently ill-posed: a monocular video captures only a narrow spatio-temporal view of a dynamic 3D scene, providing severely limited observations of the underlying 4D world. The key challenge is therefore to recover a complete and coherent representation from this limited input, with consistent geometry and motion. While recent diffusion-based methods achieve impressive visual generation quality, they often break down under large-angle viewpoint changes far from the original trajectory, where missing visual grounding leads to severe geometric ambiguity and temporal inconsistency. We present FreeOrbit4D, an effective training-free framework that tackles this ambiguity by recovering a foreground-complete 4D proxy as structural grounding for video generation. We obtain this proxy by decoupling foreground and background reconstructions: we unproject the monocular video into a static background and partial foreground point clouds in a unified global space, then use an object-centric multi-view diffusion model to synthesize multi-view images and reconstruct complete foreground point clouds in canonical object space. By aligning the canonical foreground point cloud to the global scene space via dense pixel-synchronized 3D-3D correspondences and projecting the foreground-complete 4D proxy onto target camera viewpoints, we provide geometric scaffolds that guide a conditional video diffusion model. Extensive experiments show that FreeOrbit4D produces more faithful and temporally coherent redirected videos under challenging large-angle trajectories, and our proxy further enables applications such as edit propagation and 4D data generation. Project page: https://freeorbit4d.vision.ischool.illinois.edu/
翻译:相机重定向旨在根据用户指定的相机轨迹,从单段单目视频中重放动态场景。然而,大角度重定向本身具有不适定性:单目视频仅捕捉动态三维场景的狭窄时空视角,对底层四维世界的观测极为有限。因此,核心挑战在于从有限输入中恢复完整且连贯的表征,并保持几何与运动的一致性。尽管近期基于扩散的方法实现了惊艳的视觉生成质量,但在远离原始轨迹的大角度视角变化下,由于缺失视觉参考,这些方法常因严重的几何模糊性与时间不一致性而失效。我们提出FreeOrbit4D——一种无需训练的有效框架,通过恢复前景完整的4D代理作为视频生成的几何结构支撑,来解决上述歧义问题。通过解耦前景与背景重建获得该代理:将单目视频反投影至统一全局空间中的静态背景与部分前景点云,再采用以物体为中心的多视角扩散模型合成多视角图像,并在标准物体空间中重建完整前景点云。通过密集像素同步的3D-3D对应关系将标准前景点云对齐至全局场景空间,并将前景完整4D代理投影到目标相机视角,我们为条件视频扩散模型提供几何支架。大量实验表明,FreeOrbit4D在挑战性的大角度轨迹下能生成更逼真、时间上更连贯的重定向视频,且所提代理进一步支持编辑传播与4D数据生成等应用。项目页面:https://freeorbit4d.vision.ischool.illinois.edu/