Unbounded 3D world generation is emerging as a foundational task for scene modeling in computer vision, graphics, and robotics. In this work, we present WorldFlow3D, a novel method capable of generating unbounded 3D worlds. Building upon a foundational property of flow matching - namely, defining a path of transport between two data distributions - we model 3D generation more generally as a problem of flowing through 3D data distributions, not limited to conditional denoising. We find that our latent-free flow approach generates causal and accurate 3D structure, and can use this as an intermediate distribution to guide the generation of more complex structure and high-quality texture - all while converging more rapidly than existing methods. We enable controllability over generated scenes with vectorized scene layout conditions for geometric structure control and visual texture control through scene attributes. We confirm the effectiveness of WorldFlow3D on both real outdoor driving scenes and synthetic indoor scenes, validating cross-domain generalizability and high-quality generation on real data distributions. We confirm favorable scene generation fidelity over approaches in all tested settings for unbounded scene generation. For more, see https://light.princeton.edu/worldflow3d.
翻译:无界三维世界生成正成为计算机视觉、图形学与机器人领域中场景建模的基础任务。本文提出WorldFlow3D——一种能够生成无界三维世界的新方法。基于流匹配的基本性质(即在两个数据分布之间定义传输路径),我们将三维生成更普遍地建模为流经三维数据分布的问题,而非局限于条件去噪。我们发现,这种无潜在变量的流方法能够生成因果且准确的三维结构,并可将其作为中间分布引导更复杂结构及高质量纹理的生成——同时收敛速度优于现有方法。通过向量化场景布局条件实现几何结构控制,并结合场景属性实现视觉纹理控制,我们赋予生成场景可控性。我们在真实室外驾驶场景与合成室内场景上验证了WorldFlow3D的有效性,证实其跨领域泛化能力以及在真实数据分布上的高质量生成。在所有测试设置的无界场景生成中,我们确认了该方法相较于其他方法的显著场景生成保真度优势。更多详情请参见:https://light.princeton.edu/worldflow3d。