Recently, 3D GANs based on 3D Gaussian splatting have been proposed for high quality synthesis of human heads. However, existing methods stabilize training and enhance rendering quality from steep viewpoints by conditioning the random latent vector on the current camera position. This compromises 3D consistency, as we observe significant identity changes when re-synthesizing the 3D head with each camera shift. Conversely, fixing the camera to a single viewpoint yields high-quality renderings for that perspective but results in poor performance for novel views. Removing view-conditioning typically destabilizes GAN training, often causing the training to collapse. In response to these challenges, we introduce CGS-GAN, a novel 3D Gaussian Splatting GAN framework that enables stable training and high-quality 3D-consistent synthesis of human heads without relying on view-conditioning. To ensure training stability, we introduce a multi-view regularization technique that enhances generator convergence with minimal computational overhead. Additionally, we adapt the conditional loss used in existing 3D Gaussian splatting GANs and propose a generator architecture designed to not only stabilize training but also facilitate efficient rendering and straightforward scaling, enabling output resolutions up to $2048^2$. To evaluate the capabilities of CGS-GAN, we curate a new dataset derived from FFHQ. This dataset enables very high resolutions, focuses on larger portions of the human head, reduces view-dependent artifacts for improved 3D consistency, and excludes images where subjects are obscured by hands or other objects. As a result, our approach achieves very high rendering quality, supported by competitive FID scores, while ensuring consistent 3D scene generation. Check our our project page here: https://fraunhoferhhi.github.io/cgs-gan/
翻译:近期,基于三维高斯泼溅的3D GAN 被提出用于高质量的人头合成。然而,现有方法通过将随机隐向量与当前相机位置关联来稳定训练并提升陡峭视角下的渲染质量。这损害了三维一致性,因为我们观察到每次相机移动后重新合成三维头部时会出现显著的身份特征变化。反之,将相机固定于单一视角虽能在该视角下获得高质量渲染,但在新视角下表现不佳。移除视角关联通常会破坏 GAN 训练的稳定性,常导致训练崩溃。针对这些挑战,我们提出了 CGS-GAN,一种新颖的三维高斯泼溅 GAN 框架,它能够在无需依赖视角关联的情况下实现稳定的训练和高质量的三维一致人头合成。为确保训练稳定性,我们引入了一种多视角正则化技术,该技术能以最小的计算开销增强生成器的收敛性。此外,我们改进了现有三维高斯泼溅 GAN 中使用的条件损失,并提出了一种生成器架构,该架构不仅能稳定训练,还能促进高效渲染和直接缩放,支持高达 $2048^2$ 的输出分辨率。为评估 CGS-GAN 的性能,我们基于 FFHQ 构建了一个新数据集。该数据集支持极高分辨率,聚焦于人头的更大区域,减少了视角相关伪影以提升三维一致性,并排除了主体被手或其他物体遮挡的图像。因此,我们的方法在保持竞争性 FID 分数的同时实现了极高的渲染质量,并确保了三维场景生成的一致性。项目页面请访问:https://fraunhoferhhi.github.io/cgs-gan/