A fundamental challenge in text-to-3D face generation is achieving high-quality geometry. The core difficulty lies in the arbitrary and intricate distribution of vertices in 3D space, making it challenging for existing models to establish clean connectivity and resulting in suboptimal geometry. To address this, our core insight is to simplify the underlying geometric structure by constraining the distribution onto a simple and regular manifold, a topological sphere. Building on this, we first propose the Spherical Geometry Representation, a novel face representation that anchors geometric signals to uniform spherical coordinates. This guarantees a regular point distribution, from which the mesh connectivity can be robustly reconstructed. Critically, this canonical sphere can be seamlessly unwrapped into a 2D map, creating a perfect synergy with powerful 2D generative models. We then introduce Spherical Geometry Diffusion, a conditional diffusion framework built upon this 2D map. It enables diverse and controllable generation by jointly modeling geometry and texture, where the geometry explicitly conditions the texture synthesis process. Our method's effectiveness is demonstrated through its success in a wide range of tasks: text-to-3D generation, face reconstruction, and text-based 3D editing. Extensive experiments show that our approach substantially outperforms existing methods in geometric quality, textual fidelity, and inference efficiency.
翻译:文本到三维人脸生成的一个根本挑战在于实现高质量的几何结构。核心难点在于三维空间中顶点分布的任意性和复杂性,这使得现有模型难以建立清晰的连接关系,从而导致次优的几何质量。为解决这一问题,我们的核心思路是通过将分布约束到简单规则的流形——拓扑球面上,从而简化底层几何结构。基于此,我们首先提出球面几何表示法,这是一种将几何信号锚定于均匀球面坐标的新型人脸表示方法。该方法保证了规则的点分布,从而能够稳健地重建网格连接关系。关键在于,该规范球面可以无缝展开为二维映射图,与强大的二维生成模型形成完美协同。随后,我们提出球面几何扩散模型,这是基于该二维映射图构建的条件扩散框架。通过联合建模几何与纹理,该框架实现了多样化且可控的生成过程,其中几何信息显式地约束纹理合成流程。我们的方法在广泛任务中展现了有效性:文本到三维生成、人脸重建以及基于文本的三维编辑。大量实验表明,本方法在几何质量、文本忠实度和推理效率方面显著优于现有方法。