Although the recent rapid evolution of 3D generative neural networks greatly improves 3D shape generation, it is still not convenient for ordinary users to create 3D shapes and control the local geometry of generated shapes. To address these challenges, we propose a diffusion-based 3D generation framework -- locally attentional SDF diffusion, to model plausible 3D shapes, via 2D sketch image input. Our method is built on a two-stage diffusion model. The first stage, named occupancy-diffusion, aims to generate a low-resolution occupancy field to approximate the shape shell. The second stage, named SDF-diffusion, synthesizes a high-resolution signed distance field within the occupied voxels determined by the first stage to extract fine geometry. Our model is empowered by a novel view-aware local attention mechanism for image-conditioned shape generation, which takes advantage of 2D image patch features to guide 3D voxel feature learning, greatly improving local controllability and model generalizability. Through extensive experiments in sketch-conditioned and category-conditioned 3D shape generation tasks, we validate and demonstrate the ability of our method to provide plausible and diverse 3D shapes, as well as its superior controllability and generalizability over existing work. Our code and trained models are available at https://zhengxinyang.github.io/projects/LAS-Diffusion.html
翻译:尽管近期三维生成神经网络的快速发展显著提升了三维形状生成能力,但普通用户便捷创建三维形状并控制生成形状局部几何特性仍存在困难。为解决这些挑战,我们提出基于扩散的三维生成框架——局部注意力SDF扩散,通过二维草图图像输入实现逼真的三维形状建模。该方法基于两阶段扩散模型:第一阶段称为占据率扩散,旨在生成低分辨率占据场以近似形状外壳;第二阶段称为SDF扩散,在第一阶段确定的占据体素内合成高分辨率符号距离场以提取精细几何结构。模型通过新颖的视图感知局部注意力机制增强图像条件形状生成能力,利用二维图像块特征引导三维体素特征学习,显著提升局部可控性与模型泛化能力。在草图条件与类别条件三维形状生成任务的广泛实验中,我们验证并展示了该方法生成逼真多样三维形状的能力,及其相较现有工作更优的可控性与泛化性能。相关代码与预训练模型已开源至 https://zhengxinyang.github.io/projects/LAS-Diffusion.html