Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial-temporal consistency constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift. We argue that 360° video generation offers a natural solution: panoramic coverage simplifies trajectory design and provides a strong global context for maintaining coherence. We introduce Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion, a controllable 360° video generation framework that synthesizes high-fidelity videos from sparse 360° inputs. The key idea is an explicit 3D Cache, reconstructed from the input, which serves as a geometric scaffold for any user-defined camera path. This allows the diffusion model to focus on photorealistic texture refinement while the 3D Cache enforces global geometric consistency. Experiments show that Pantheon360 achieves superior visual quality and unmatched geometric coherence, enabling reliable and flexible 360° scene generation for downstream simulation and digital-twin applications.
翻译:从视频生成完整的数字孪生体需要精确的相机控制、全局场景覆盖以及严格的时空一致性约束,而这些对于视角有限的透视视频生成器而言仍是挑战。其狭窄的视场迫使采用长序列或多视角轨迹,从而加剧了跨视角不一致性和时间漂移。我们认为360°视频生成本身提供了自然的解决方案:全景覆盖简化了轨迹设计,并为维持连贯性提供了强大的全局上下文。我们提出Pantheon360: 通过三维感知的360°视频扩散驯服数字孪生生成,这是一个可控的360°视频生成框架,能够从稀疏的360°输入中合成高保真视频。其关键思想是从输入重建的显式三维缓存,该缓存作为用户定义相机轨迹的几何骨架。这使得扩散模型能够专注于逼真的纹理细化,同时三维缓存强制执行全局几何一致性。实验表明,Pantheon360实现了卓越的视觉质量和无与伦比的几何连贯性,从而为下游仿真和数字孪生应用提供了可靠且灵活的360°场景生成。