With the rising industrial attention to 3D virtual modeling technology, generating novel 3D content based on specified conditions (e.g. text) has become a hot issue. In this paper, we propose a new generative 3D modeling framework called Diffusion-SDF for the challenging task of text-to-shape synthesis. Previous approaches lack flexibility in both 3D data representation and shape generation, thereby failing to generate highly diversified 3D shapes conforming to the given text descriptions. To address this, we propose a SDF autoencoder together with the Voxelized Diffusion model to learn and generate representations for voxelized signed distance fields (SDFs) of 3D shapes. Specifically, we design a novel UinU-Net architecture that implants a local-focused inner network inside the standard U-Net architecture, which enables better reconstruction of patch-independent SDF representations. We extend our approach to further text-to-shape tasks including text-conditioned shape completion and manipulation. Experimental results show that Diffusion-SDF generates both higher quality and more diversified 3D shapes that conform well to given text descriptions when compared to previous approaches. Code is available at: https://github.com/ttlmh/Diffusion-SDF
翻译:随着工业界对三维虚拟建模技术的日益关注,基于特定条件(如文本)生成新颖三维内容已成为热点问题。本文针对文本生成三维形状这一具有挑战性的任务,提出了一种名为Diffusion-SDF的新型生成式三维建模框架。以往方法在三维数据表示和形状生成方面均缺乏灵活性,因此无法生成与给定文本描述相符的高度多样化三维形状。为解决此问题,我们提出了一种SDF自编码器结合体素化扩散模型,用于学习并生成三维形状体素化符号距离场(SDF)的表示。具体而言,我们设计了一种新颖的UinU-Net架构,该架构在标准U-Net内部嵌入了一个局部聚焦的内部网络,从而能够更优地重建块独立的SDF表示。我们将该方法扩展到更多文本生成形状任务,包括文本条件形状补全与编辑。实验结果表明,与以往方法相比,Diffusion-SDF能够生成更高质量、更多样化且与给定文本描述高度一致的三维形状。代码开源地址:https://github.com/ttlmh/Diffusion-SDF