Condition Matters in Full-head 3D GANs

Conditioning is crucial for stable training of full-head 3D GANs. Without any conditioning signal, the model suffers from severe mode collapse, making it impractical to training. However, a series of previous full-head 3D GANs conventionally choose the view angle as the conditioning input, which leads to a bias in the learned 3D full-head space along the conditional view direction. This is evident in the significant differences in generation quality and diversity between the conditional view and non-conditional views of the generated 3D heads, resulting in global incoherence across different head regions. In this work, we propose to use view-invariant semantic feature as the conditioning input, thereby decoupling the generative capability of 3D heads from the viewing direction. To construct a view-invariant semantic condition for each training image, we create a novel synthesized head image dataset. We leverage FLUX.1 Kontext to extend existing high-quality frontal face datasets to a wide range of view angles. The image clip feature extracted from the frontal view is then used as a shared semantic condition across all views in the extended images, ensuring semantic alignment while eliminating directional bias. This also allows supervision from different views of the same subject to be consolidated under a shared semantic condition, which accelerates training and enhances the global coherence of the generated 3D heads. Moreover, as GANs often experience slower improvements in diversity once the generator learns a few modes that successfully fool the discriminator, our semantic conditioning encourages the generator to follow the true semantic distribution, thereby promoting continuous learning and diverse generation. Extensive experiments on full-head synthesis and single-view GAN inversion demonstrate that our method achieves significantly higher fidelity, diversity, and generalizability.

翻译：条件设置对于全头部3D生成对抗网络的稳定训练至关重要。在没有任何条件信号的情况下，模型会遭受严重的模式崩溃，导致训练无法实际进行。然而，先前一系列全头部3D GAN通常选择视角作为条件输入，这导致学习到的3D全头部空间沿着条件视角方向产生偏差。这在生成的3D头部的条件视角与非条件视角之间的生成质量和多样性存在显著差异中显而易见，导致不同头部区域之间缺乏全局一致性。在本研究中，我们提出使用视角不变的语义特征作为条件输入，从而将3D头部的生成能力与观察方向解耦。为了为每张训练图像构建视角不变的语义条件，我们创建了一个新颖的合成头部图像数据集。我们利用FLUX.1 Kontext将现有的高质量正面人脸数据集扩展到广泛的视角范围。从正面视图提取的图像剪辑特征随后被用作扩展图像中所有视图的共享语义条件，确保语义对齐的同时消除了方向偏差。这也使得同一主体不同视角的监督能够在共享语义条件下得到整合，从而加速训练并增强生成3D头部的全局一致性。此外，由于GAN在生成器学会少数能成功欺骗判别器的模式后，其多样性提升往往放缓，我们的语义条件鼓励生成器遵循真实的语义分布，从而促进持续学习和多样化生成。在全头部合成和单视图GAN反转上的大量实验表明，我们的方法在保真度、多样性和泛化能力方面均取得了显著提升。