SemCity: Semantic Scene Generation with Triplane Diffusion

We present "SemCity," a 3D diffusion model for semantic scene generation in real-world outdoor environments. Most 3D diffusion models focus on generating a single object, synthetic indoor scenes, or synthetic outdoor scenes, while the generation of real-world outdoor scenes is rarely addressed. In this paper, we concentrate on generating a real-outdoor scene through learning a diffusion model on a real-world outdoor dataset. In contrast to synthetic data, real-outdoor datasets often contain more empty spaces due to sensor limitations, causing challenges in learning real-outdoor distributions. To address this issue, we exploit a triplane representation as a proxy form of scene distributions to be learned by our diffusion model. Furthermore, we propose a triplane manipulation that integrates seamlessly with our triplane diffusion model. The manipulation improves our diffusion model's applicability in a variety of downstream tasks related to outdoor scene generation such as scene inpainting, scene outpainting, and semantic scene completion refinements. In experimental results, we demonstrate that our triplane diffusion model shows meaningful generation results compared with existing work in a real-outdoor dataset, SemanticKITTI. We also show our triplane manipulation facilitates seamlessly adding, removing, or modifying objects within a scene. Further, it also enables the expansion of scenes toward a city-level scale. Finally, we evaluate our method on semantic scene completion refinements where our diffusion model enhances predictions of semantic scene completion networks by learning scene distribution. Our code is available at https://github.com/zoomin-lee/SemCity.

翻译：我们提出“SemCity”，一种用于真实世界户外环境中语义场景生成的3D扩散模型。现有大多数3D扩散模型聚焦于单物体生成、合成室内场景或合成户外场景，而真实户外场景的生成问题鲜有涉及。本文通过在实际户外数据集上学习扩散模型，专注解决真实户外场景生成问题。与合成数据不同，由于传感器限制，真实户外数据集常包含更多空白区域，导致学习真实户外数据分布面临挑战。为解决此问题，我们采用三平面表示作为场景分布的代理形式，供扩散模型学习。此外，我们提出与三平面扩散模型无缝集成的三平面操作。该操作提升了扩散模型在多种户外场景生成下游任务中的适用性，包括场景修复、场景外延和语义场景补全优化。实验结果表明，在真实户外数据集SemanticKITTI上，我们的三平面扩散模型相较于现有工作展现了有意义的生成结果。我们同时证明三平面操作能够无缝实现场景中物体的添加、移除或修改，并支持将场景扩展到城市级规模。最后，我们在语义场景补全优化任务中评估了该方法——通过学习场景分布，扩散模型增强了语义场景补全网络的预测能力。我们的代码开源在 https://github.com/zoomin-lee/SemCity。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日