FreeOrbit4D：通过几何完备的4D重建实现单目视频免训练的任意相机重定向 (FreeOrbit4D: Training-Free Arbitrary Camera Redirection for Monocular Videos via Geometry-Complete 4D Reconstruction)

Camera redirection aims to replay a dynamic scene from a single monocular video under a user-specified camera trajectory. However, large-angle redirection is inherently ill-posed: a monocular video captures only a narrow spatio-temporal view of a dynamic 3D scene, providing highly partial observations of the underlying 4D world. The key challenge is therefore to recover a complete and coherent representation from this limited input, with consistent geometry and motion. While recent diffusion-based methods achieve impressive results, they often break down under large-angle viewpoint changes far from the original trajectory, where missing visual grounding leads to severe geometric ambiguity and temporal inconsistency. To address this, we present FreeOrbit4D, an effective training-free framework that tackles this geometric ambiguity by recovering a geometry-complete 4D proxy as structural grounding for video generation. We obtain this proxy by decoupling foreground and background reconstructions: we unproject the monocular video into a static background and geometry-incomplete foreground point clouds in a unified global space, then leverage an object-centric multi-view diffusion model to synthesize multi-view images and reconstruct geometry-complete foreground point clouds in canonical object space. By aligning the canonical foreground point cloud to the global scene space via dense pixel-synchronized 3D--3D correspondences and projecting the geometry-complete 4D proxy onto target camera viewpoints, we provide geometric scaffolds that guide a conditional video diffusion model. Extensive experiments show that FreeOrbit4D produces more faithful redirected videos under challenging large-angle trajectories, and our geometry-complete 4D proxy further opens a potential avenue for practical applications such as edit propagation and 4D data generation. Project page and code will be released soon.

翻译：相机重定向旨在根据用户指定的相机轨迹，从单一单目视频中重放动态场景。然而，大角度重定向本质上是不适定的：单目视频仅捕捉了动态3D场景的一个狭窄时空视角，对底层4D世界提供了高度不完整的观测。因此，核心挑战在于从这一有限输入中恢复出一个完整且连贯的表示，并保持一致的几何与运动。尽管近期基于扩散模型的方法取得了令人印象深刻的结果，但在远离原始轨迹的大角度视点变化下，它们常常失效，此时缺失的视觉基础会导致严重的几何歧义和时间不一致性。为解决此问题，我们提出了FreeOrbit4D，一个有效的免训练框架，它通过恢复一个几何完备的4D代理作为视频生成的结构基础，来应对这种几何歧义。我们通过解耦前景和背景重建来获得此代理：首先将单目视频反投影到一个统一的全局空间中，得到静态背景和几何不完整的前景点云；然后利用一个以对象为中心的多视图扩散模型，在规范对象空间中合成多视图图像并重建几何完备的前景点云。通过密集的像素同步3D-3D对应关系将规范前景点云对齐到全局场景空间，并将几何完备的4D代理投影到目标相机视点，我们提供了引导条件视频扩散模型的几何支架。大量实验表明，FreeOrbit4D在具有挑战性的大角度轨迹下能生成更忠实于原场景的重定向视频，并且我们的几何完备4D代理为进一步的实际应用（如编辑传播和4D数据生成）开辟了潜在途径。项目页面和代码即将发布。