Generative Adversarial Networks (GANs) have emerged as powerful tools for high-quality image generation and real image editing by manipulating their latent spaces. Recent advancements in GANs include 3D-aware models such as EG3D, which feature efficient triplane-based architectures capable of reconstructing 3D geometry from single images. However, limited attention has been given to providing an integrated framework for 3D-aware, high-quality, reference-based image editing. This study addresses this gap by exploring and demonstrating the effectiveness of the triplane space for advanced reference-based edits. Our novel approach integrates encoding, automatic localization, spatial disentanglement of triplane features, and fusion learning to achieve the desired edits. We demonstrate how our approach excels across diverse domains, including human faces, 360-degree heads, animal faces, partially stylized edits like cartoon faces, full-body clothing edits, and edits on class-agnostic samples. Our method shows state-of-the-art performance over relevant latent direction, text, and image-guided 2D and 3D-aware diffusion and GAN methods, both qualitatively and quantitatively.
翻译:生成对抗网络(GAN)已成为高质量图像生成和通过操纵其潜在空间实现真实图像编辑的强大工具。GAN的最新进展包括如EG3D等三维感知模型,其采用高效的三平面架构,能够从单张图像重建三维几何。然而,目前对提供集成化的、基于参考图像的三维感知高质量图像编辑框架的关注有限。本研究通过探索并论证三平面空间在高级参考式编辑中的有效性,填补了这一空白。我们提出的新方法整合了编码、自动定位、三平面特征的空间解耦以及融合学习,以实现所需的编辑效果。我们展示了该方法在多个领域的卓越表现,包括人脸、360度头部模型、动物面部、局部风格化编辑(如卡通人脸)、全身服装编辑以及类别无关样本的编辑。无论在定性还是定量评估中,我们的方法在相关潜在方向引导、文本引导以及图像引导的二维与三维感知扩散模型和GAN方法上均展现出最先进的性能。