In this paper, we introduce PI3D, a novel and efficient framework that utilizes the pre-trained text-to-image diffusion models to generate high-quality 3D shapes in minutes. On the one hand, it fine-tunes a pre-trained 2D diffusion model into a 3D diffusion model, enabling both 3D generative capabilities and generalization derived from the 2D model. On the other, it utilizes score distillation sampling of 2D diffusion models to quickly improve the quality of the sampled 3D shapes. PI3D enables the migration of knowledge from image to triplane generation by treating it as a set of pseudo-images. We adapt the modules in the pre-training model to enable hybrid training using pseudo and real images, which has proved to be a well-established strategy for improving generalizability. The efficiency of PI3D is highlighted by its ability to sample diverse 3D models in seconds and refine them in minutes. The experimental results confirm the advantages of PI3D over existing methods based on either 3D diffusion models or lifting 2D diffusion models in terms of fast generation of 3D consistent and high-quality models. The proposed PI3D stands as a promising advancement in the field of text-to-3D generation, and we hope it will inspire more research into 3D generation leveraging the knowledge in both 2D and 3D data.
翻译:本文提出PI3D,一种利用预训练文本到图像扩散模型在数分钟内生成高质量三维形状的新型高效框架。一方面,它通过微调将预训练的二维扩散模型转化为三维扩散模型,同时继承二维模型的生成能力与泛化特性;另一方面,它利用二维扩散模型的分数蒸馏采样快速提升采样三维形状的质量。PI3D通过将三平面生成视为一组伪图像,实现了图像知识向三平面生成的迁移。我们调整预训练模型中的模块,使其能够使用伪图像与真实图像进行混合训练——这一策略被证实为提升泛化性的有效方法。PI3D的高效性体现在其能在数秒内采样多样三维模型并在数分钟内完成优化。实验结果证实,相较于基于三维扩散模型或提升二维扩散模型的现有方法,PI3D在快速生成三维一致且高质量模型方面具有显著优势。所提出的PI3D代表了文本到三维生成领域的重要进展,我们期望它能启发更多利用二维与三维数据知识进行三维生成的研究。