Generating images with both photorealism and multiview 3D consistency is crucial for 3D-aware GANs, yet existing methods struggle to achieve them simultaneously. Improving the photorealism via CNN-based 2D super-resolution can break the strict 3D consistency, while keeping the 3D consistency by learning high-resolution 3D representations for direct rendering often compromises image quality. In this paper, we propose a novel learning strategy, namely 3D-to-2D imitation, which enables a 3D-aware GAN to generate high-quality images while maintaining their strict 3D consistency, by letting the images synthesized by the generator's 3D rendering branch to mimic those generated by its 2D super-resolution branch. We also introduce 3D-aware convolutions into the generator for better 3D representation learning, which further improves the image generation quality. With the above strategies, our method reaches FID scores of 5.4 and 4.3 on FFHQ and AFHQ-v2 Cats, respectively, at 512x512 resolution, largely outperforming existing 3D-aware GANs using direct 3D rendering and coming very close to the previous state-of-the-art method that leverages 2D super-resolution. Project website: https://seanchenxy.github.io/Mimic3DWeb.
翻译:生成兼具照片真实感与多视角三维一致性的图像对3D感知生成对抗网络至关重要,但现有方法难以同时实现这两者。通过基于CNN的2D超分辨率提升真实感会破坏严格的三维一致性,而通过学习高分辨率3D表示直接渲染以保持三维一致性则常以牺牲图像质量为代价。本文提出一种新颖学习策略——3D到2D模仿,该方法通过让生成器3D渲染分支合成的图像模仿其2D超分辨率分支生成的图像,使得3D感知生成对抗网络能够在保持严格三维一致性的同时生成高质量图像。我们还引入3D感知卷积到生成器中以改善3D表示学习,进一步提升图像生成质量。采用上述策略,我们的方法在512×512分辨率下,FFHQ和AFHQ-v2 Cats数据集上的FID分数分别达到5.4和4.3,大幅超越采用直接3D渲染的现有3D感知生成对抗网络,并接近先前利用2D超分辨率的最优方法。项目网站:https://seanchenxy.github.io/Mimic3DWeb。