Generative Adversarial Networks (GANs) have emerged as powerful tools for high-quality image generation and real image editing by manipulating their latent spaces. Recent advancements in GANs include 3D-aware models such as EG3D, which feature efficient triplane-based architectures capable of reconstructing 3D geometry from single images. However, limited attention has been given to providing an integrated framework for 3D-aware, high-quality, reference-based image editing. This study addresses this gap by exploring and demonstrating the effectiveness of the triplane space for advanced reference-based edits. Our novel approach integrates encoding, automatic localization, spatial disentanglement of triplane features, and fusion learning to achieve the desired edits. Additionally, our framework demonstrates versatility and robustness across various domains, extending its effectiveness to animal face edits, partially stylized edits like cartoon faces, full-body clothing edits, and 360-degree head edits. Our method shows state-of-the-art performance over relevant latent direction, text, and image-guided 2D and 3D-aware diffusion and GAN methods, both qualitatively and quantitatively.
翻译:生成对抗网络(GAN)已成为通过操纵其潜在空间实现高质量图像生成与真实图像编辑的强大工具。GAN的最新进展包括EG3D等三维感知模型,这些模型采用基于三平面的高效架构,能够从单张图像重建三维几何。然而,目前鲜有研究提供集成化的三维感知、高质量、基于参考图像的编辑框架。本研究通过探索并论证三平面空间在高级参考式编辑中的有效性,填补了这一空白。我们提出的新方法集成了编码、自动定位、三平面特征的空间解耦与融合学习,以实现目标编辑效果。此外,本框架在多个领域展现出卓越的通用性与鲁棒性,其有效性可扩展至动物面部编辑、局部风格化编辑(如卡通人脸)、全身服装编辑及360度头部编辑。定性与定量实验表明,本方法在相关潜在方向引导、文本引导及图像引导的二维与三维感知扩散模型与GAN方法中均达到了最先进的性能水平。