Single image-to-3D generation is pivotal for crafting controllable 3D assets. Given its underconstrained nature, we leverage geometric priors from a 3D novel view generation diffusion model and appearance priors from a 2D image generation method to guide the optimization process. We note that a disparity exists between the training datasets of 2D and 3D diffusion models, leading to their outputs showing marked differences in appearance. Specifically, 2D models tend to deliver more detailed visuals, whereas 3D models produce consistent yet over-smooth results across different views. Hence, we optimize a set of 3D Gaussians using 3D priors in spatial domain to ensure geometric consistency, while exploiting 2D priors in the frequency domain through Fourier transform for higher visual quality. This 2D-3D hybrid Fourier Score Distillation objective function (dubbed hy-FSD), can be integrated into existing 3D generation methods, yielding significant performance improvements. With this technique, we further develop an image-to-3D generation pipeline to create high-quality 3D objects within one minute, named Fourier123. Extensive experiments demonstrate that Fourier123 excels in efficient generation with rapid convergence speed and visual-friendly generation results.
翻译:单图像到三维生成对于创建可控的三维资产至关重要。鉴于其欠约束的特性,我们利用来自三维新视角生成扩散模型的几何先验和来自二维图像生成方法的外观先验来指导优化过程。我们注意到,二维与三维扩散模型的训练数据集之间存在差异,导致它们的输出在外观上表现出显著不同。具体而言,二维模型倾向于提供更精细的视觉效果,而三维模型则在不同视角下产生一致但过度平滑的结果。因此,我们在空间域中使用三维先验来优化一组三维高斯分布,以确保几何一致性,同时通过傅里叶变换在频域中利用二维先验以获得更高的视觉质量。这种二维-三维混合傅里叶分数蒸馏目标函数(称为 hy-FSD),可以集成到现有的三维生成方法中,带来显著的性能提升。基于此技术,我们进一步开发了一个图像到三维生成流程,可在一分钟内创建高质量的三维物体,命名为 Fourier123。大量实验表明,Fourier123 在高效生成方面表现出色,具有快速的收敛速度和视觉友好的生成结果。