TOPOS: High-Fidelity and Efficient Industry-Grade 3D Head Generation

High-fidelity 3D head generation plays a crucial role in the film, animation and video game industries. In industrial pipelines, studios typically enforce a fixed reference topology across all head assets, as such a clean and uniform topology is a prerequisite for production-level rigging, skinning and animation. In this paper, we present TOPOS, a framework tailored for single image conditioned 3D head generation that jointly recovers geometry and appearance under such an industry-standard topology. In contrast to general 3D generative models which produce triangle meshes with inconsistent topology and numerous vertices, hindering semantic correspondence and asset-level reuse, TOPOS generates head meshes with a fixed, studio-style topology, enabling consistent vertex-level correspondence across all generated heads. To model heads under this unified topology, we proposed a novel variational autoencoder structure, termed TOPOS-VAE. Inspired by multi-model large language models (MLLMs), our TOPOS-VAE leverages the Perceiver Resampler to convert input pointclouds sampled from head meshes of diverse topologies into the target reference topology. Building upon TOPOS-VAE's structured latent space, we train a rectified flow transformer, TOPOS-DiT, to efficiently generate high-fidelity head meshes from a single image. We further present TOPOS-Texture, an end-to-end module that produces relightable UV texture maps from the same portrait image via fine-tuning a multimodal image generative model. The generated textures are spatially aligned with the underlying mesh geometry and faithfully preserve high-frequency appearance details. Extensive experiments demonstrate that TOPOS achieves state-of-the-art performance on 3D head generation, surpassing both classical face reconstruction methods and general 3D object generative models, highlighting its effectiveness for digital human creation.

翻译：高保真三维头部生成在影视、动画与游戏行业中扮演着至关重要的角色。在工业流程中，工作室通常对所有头部资产强制执行固定参考拓扑，因为这种干净且统一的拓扑是生产级绑定、蒙皮与动画的先决条件。本文提出TOPOS框架，专为基于单张图片的三维头部生成而设计，能在这种行业标准拓扑下同时恢复几何与外观。与生成三角网格且拓扑不一致、顶点数量众多（从而阻碍语义对应与资产级复用）的通用三维生成模型不同，TOPOS生成的头部网格具有固定的工作室风格拓扑，使得所有生成头部之间具备一致的顶点级对应。为在这种统一拓扑下建模头部，我们提出一种新型变分自编码器结构，称为TOPOS-VAE。受多模态大语言模型启发，TOPOS-VAE利用Perceiver Resampler将从不同拓扑头部网格中采样的输入点云转换为目标参考拓扑。基于TOPOS-VAE的结构化潜空间，我们训练了一个整流流变压器TOPOS-DiT，以高效地从单张图片生成高保真头部网格。我们进一步提出TOPOS-Texture，这是一个端到端模块，通过微调多模态图像生成模型，从同一肖像图片生成可重光照的UV纹理贴图。生成的纹理与底层网格几何在空间上对齐，并忠实保留高频外观细节。大量实验表明，TOPOS在三维头部生成上达到了最先进性能，超越经典人脸重建方法与通用三维物体生成模型，凸显了其在数字人创建中的有效性。