Camera redirection aims to replay a dynamic scene from a single monocular video under a user-specified camera trajectory. However, large-angle redirection is inherently ill-posed: a monocular video captures only a narrow spatio-temporal view of a dynamic 3D scene, providing highly partial observations of the underlying 4D world. The key challenge is therefore to recover a complete and coherent representation from this limited input, with consistent geometry and motion. While recent diffusion-based methods achieve impressive results, they often break down under large-angle viewpoint changes far from the original trajectory, where missing visual grounding leads to severe geometric ambiguity and temporal inconsistency. To address this, we present FreeOrbit4D, an effective training-free framework that tackles this geometric ambiguity by recovering a geometry-complete 4D proxy as structural grounding for video generation. We obtain this proxy by decoupling foreground and background reconstructions: we unproject the monocular video into a static background and geometry-incomplete foreground point clouds in a unified global space, then leverage an object-centric multi-view diffusion model to synthesize multi-view images and reconstruct geometry-complete foreground point clouds in canonical object space. By aligning the canonical foreground point cloud to the global scene space via dense pixel-synchronized 3D--3D correspondences and projecting the geometry-complete 4D proxy onto target camera viewpoints, we provide geometric scaffolds that guide a conditional video diffusion model. Extensive experiments show that FreeOrbit4D produces more faithful redirected videos under challenging large-angle trajectories, and our geometry-complete 4D proxy further opens a potential avenue for practical applications such as edit propagation and 4D data generation. Project page and code will be released soon.
翻译:相机重定向旨在根据用户指定的相机轨迹,从单一单目视频中重放动态场景。然而,大角度重定向本质上是不适定的:单目视频仅捕捉了动态3D场景的一个狭窄时空视角,对底层4D世界提供了高度不完整的观测。因此,核心挑战在于从这一有限输入中恢复出一个完整且连贯的表示,并保持一致的几何与运动。尽管近期基于扩散模型的方法取得了令人印象深刻的结果,但在远离原始轨迹的大角度视点变化下,它们常常失效,此时缺失的视觉基础会导致严重的几何歧义和时间不一致性。为解决此问题,我们提出了FreeOrbit4D,一个有效的免训练框架,它通过恢复一个几何完备的4D代理作为视频生成的结构基础,来应对这种几何歧义。我们通过解耦前景和背景重建来获得此代理:首先将单目视频反投影到一个统一的全局空间中,得到静态背景和几何不完整的前景点云;然后利用一个以对象为中心的多视图扩散模型,在规范对象空间中合成多视图图像并重建几何完备的前景点云。通过密集的像素同步3D-3D对应关系将规范前景点云对齐到全局场景空间,并将几何完备的4D代理投影到目标相机视点,我们提供了引导条件视频扩散模型的几何支架。大量实验表明,FreeOrbit4D在具有挑战性的大角度轨迹下能生成更忠实于原场景的重定向视频,并且我们的几何完备4D代理为进一步的实际应用(如编辑传播和4D数据生成)开辟了潜在途径。项目页面和代码即将发布。