We present BlockFusion, a diffusion-based model that generates 3D scenes as unit blocks and seamlessly incorporates new blocks to extend the scene. BlockFusion is trained using datasets of 3D blocks that are randomly cropped from complete 3D scene meshes. Through per-block fitting, all training blocks are converted into the hybrid neural fields: with a tri-plane containing the geometry features, followed by a Multi-layer Perceptron (MLP) for decoding the signed distance values. A variational auto-encoder is employed to compress the tri-planes into the latent tri-plane space, on which the denoising diffusion process is performed. Diffusion applied to the latent representations allows for high-quality and diverse 3D scene generation. To expand a scene during generation, one needs only to append empty blocks to overlap with the current scene and extrapolate existing latent tri-planes to populate new blocks. The extrapolation is done by conditioning the generation process with the feature samples from the overlapping tri-planes during the denoising iterations. Latent tri-plane extrapolation produces semantically and geometrically meaningful transitions that harmoniously blend with the existing scene. A 2D layout conditioning mechanism is used to control the placement and arrangement of scene elements. Experimental results indicate that BlockFusion is capable of generating diverse, geometrically consistent and unbounded large 3D scenes with unprecedented high-quality shapes in both indoor and outdoor scenarios.
翻译:我们提出BlockFusion,一种基于扩散模型的三维场景生成方法,将场景作为基本单元块进行生成,并通过无缝整合新块来扩展场景。BlockFusion使用从完整三维场景网格中随机裁剪的三维块数据集进行训练。通过逐块拟合,所有训练块被转换为混合神经场:采用包含几何特征的三平面结构,随后使用多层感知机解码有符号距离值。使用变分自编码器将三平面压缩至潜变量三平面空间,并在该空间执行去噪扩散过程。对潜变量表示施加扩散,能够生成高质量且多样化的三维场景。在生成过程中扩展场景时,只需添加与当前场景重叠的空块,并通过外推现有潜变量三平面来填充新块。外推通过去噪迭代中从重叠三平面提取特征样本作为生成过程的条件实现。潜变量三平面外推能产生语义与几何上连贯的过渡,与现有场景和谐融合。采用二维布局条件机制控制场景元素的放置与排列。实验结果表明,BlockFusion能够生成多样、几何一致且无界的超大三维场景,在室内外场景中均呈现出前所未有的高质量形状。