In recent times, the generation of 3D assets from text prompts has shown impressive results. Both 2D and 3D diffusion models can generate decent 3D objects based on prompts. 3D diffusion models have good 3D consistency, but their quality and generalization are limited as trainable 3D data is expensive and hard to obtain. 2D diffusion models enjoy strong abilities of generalization and fine generation, but the 3D consistency is hard to guarantee. This paper attempts to bridge the power from the two types of diffusion models via the recent explicit and efficient 3D Gaussian splatting representation. A fast 3D generation framework, named as \name, is proposed, where the 3D diffusion model provides point cloud priors for initialization and the 2D diffusion model enriches the geometry and appearance. Operations of noisy point growing and color perturbation are introduced to enhance the initialized Gaussians. Our \name can generate a high-quality 3D instance within 25 minutes on one GPU, much faster than previous methods, while the generated instances can be directly rendered in real time. Demos and code are available at https://taoranyi.com/gaussiandreamer/.
翻译:近年来,基于文本提示的三维资产生成已展现出令人瞩目的成果。二维与三维扩散模型均可根据提示生成较为优质的三维物体。三维扩散模型具有良好的三维一致性,但由于可训练的三维数据成本高昂且获取困难,其质量与泛化能力受限。二维扩散模型兼具强大的泛化能力与精细生成能力,但其三维一致性难以保障。本文尝试通过近期显式且高效的三维高斯泼溅表示,桥接两类扩散模型的优势。我们提出名为\name的快速三维生成框架,其中三维扩散模型为初始化提供点云先验,二维扩散模型则丰富几何与外观信息。通过引入噪声点生长与颜色扰动操作来增强初始化的高斯体。我们的\name方法可在单张GPU上于25分钟内生成高质量三维实例,速度远优于现有方法,且生成的实例可直接进行实时渲染。演示视频与代码已公开于https://taoranyi.com/gaussiandreamer/。