FreeOrbit4D: Training-Free Arbitrary Camera Redirection for Monocular Videos via Foreground-Complete 4D Reconstruction

Camera redirection aims to replay a dynamic scene from a single monocular video under a user-specified camera trajectory. However, large-angle redirection is inherently ill-posed: a monocular video captures only a narrow spatio-temporal view of a dynamic 3D scene, providing severely limited observations of the underlying 4D world. The key challenge is therefore to recover a complete and coherent representation from this limited input, with consistent geometry and motion. While recent diffusion-based methods achieve impressive visual generation quality, they often break down under large-angle viewpoint changes far from the original trajectory, where missing visual grounding leads to severe geometric ambiguity and temporal inconsistency. We present FreeOrbit4D, an effective training-free framework that tackles this ambiguity by recovering a foreground-complete 4D proxy as structural grounding for video generation. We obtain this proxy by decoupling foreground and background reconstructions: we unproject the monocular video into a static background and partial foreground point clouds in a unified global space, then use an object-centric multi-view diffusion model to synthesize multi-view images and reconstruct complete foreground point clouds in canonical object space. By aligning the canonical foreground point cloud to the global scene space via dense pixel-synchronized 3D-3D correspondences and projecting the foreground-complete 4D proxy onto target camera viewpoints, we provide geometric scaffolds that guide a conditional video diffusion model. Extensive experiments show that FreeOrbit4D produces more faithful and temporally coherent redirected videos under challenging large-angle trajectories, and our proxy further enables applications such as edit propagation and 4D data generation. Project page: https://freeorbit4d.vision.ischool.illinois.edu/

翻译：相机重定向旨在根据用户指定的相机轨迹，从单段单目视频中重放动态场景。然而，大角度重定向本身具有不适定性：单目视频仅捕捉动态三维场景的狭窄时空视角，对底层四维世界的观测极为有限。因此，核心挑战在于从有限输入中恢复完整且连贯的表征，并保持几何与运动的一致性。尽管近期基于扩散的方法实现了惊艳的视觉生成质量，但在远离原始轨迹的大角度视角变化下，由于缺失视觉参考，这些方法常因严重的几何模糊性与时间不一致性而失效。我们提出FreeOrbit4D——一种无需训练的有效框架，通过恢复前景完整的4D代理作为视频生成的几何结构支撑，来解决上述歧义问题。通过解耦前景与背景重建获得该代理：将单目视频反投影至统一全局空间中的静态背景与部分前景点云，再采用以物体为中心的多视角扩散模型合成多视角图像，并在标准物体空间中重建完整前景点云。通过密集像素同步的3D-3D对应关系将标准前景点云对齐至全局场景空间，并将前景完整4D代理投影到目标相机视角，我们为条件视频扩散模型提供几何支架。大量实验表明，FreeOrbit4D在挑战性的大角度轨迹下能生成更逼真、时间上更连贯的重定向视频，且所提代理进一步支持编辑传播与4D数据生成等应用。项目页面：https://freeorbit4d.vision.ischool.illinois.edu/

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

前馈式三维场景建模

专知会员服务

13+阅读 · 4月17日

【博士论文】室内场景三维重建的基于学习的方法

专知会员服务

12+阅读 · 2月16日

一种基于视觉算法生成三维场景重建的多任务系统 | 2025最新200页

专知会员服务

26+阅读 · 2025年10月20日

深度学习的多视角三维重建技术综述

专知会员服务

24+阅读 · 2025年6月7日