SCube：使用VoxSplats实现即时大规模场景重建 (SCube: Instant Large-Scale Scene Reconstruction using VoxSplats)

We present SCube, a novel method for reconstructing large-scale 3D scenes (geometry, appearance, and semantics) from a sparse set of posed images. Our method encodes reconstructed scenes using a novel representation VoxSplat, which is a set of 3D Gaussians supported on a high-resolution sparse-voxel scaffold. To reconstruct a VoxSplat from images, we employ a hierarchical voxel latent diffusion model conditioned on the input images followed by a feedforward appearance prediction model. The diffusion model generates high-resolution grids progressively in a coarse-to-fine manner, and the appearance network predicts a set of Gaussians within each voxel. From as few as 3 non-overlapping input images, SCube can generate millions of Gaussians with a 1024^3 voxel grid spanning hundreds of meters in 20 seconds. Past works tackling scene reconstruction from images either rely on per-scene optimization and fail to reconstruct the scene away from input views (thus requiring dense view coverage as input) or leverage geometric priors based on low-resolution models, which produce blurry results. In contrast, SCube leverages high-resolution sparse networks and produces sharp outputs from few views. We show the superiority of SCube compared to prior art using the Waymo self-driving dataset on 3D reconstruction and demonstrate its applications, such as LiDAR simulation and text-to-scene generation.

翻译：我们提出SCube，一种从稀疏姿态图像集重建大规模三维场景（几何、外观与语义）的新方法。该方法采用新型表示VoxSplat对重建场景进行编码，该表示是由高分辨率稀疏体素支架支撑的三维高斯集合。为从图像重建VoxSplat，我们采用基于输入图像条件化的分层体素隐扩散模型，并级联前馈式外观预测模型。扩散模型以由粗到细的方式渐进生成高分辨率网格，外观网络则在每个体素内预测一组高斯分布。仅需3张非重叠输入图像，SCube即可在20秒内生成覆盖数百米范围的1024^3体素网格及数百万高斯分布。现有图像场景重建方法或依赖逐场景优化且无法重建输入视角外的场景（因而需要密集视角覆盖作为输入），或基于低分辨率模型利用几何先验导致结果模糊。相比之下，SCube利用高分辨率稀疏网络，仅需少量视角即可生成清晰输出。我们在Waymo自动驾驶数据集上通过三维重建任务展示SCube相较于现有技术的优越性，并演示其在激光雷达仿真与文本到场景生成等领域的应用。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日