Recent progress in NeRF-based GANs has introduced a number of approaches for high-resolution and high-fidelity generative modeling of human heads with a possibility for novel view rendering. At the same time, one must solve an inverse problem to be able to re-render or modify an existing image or video. Despite the success of universal optimization-based methods for 2D GAN inversion, those, applied to 3D GANs, may fail to produce 3D-consistent renderings. Fast encoder-based techniques, such as those developed for StyleGAN, may also be less appealing due to the lack of identity preservation. In our work, we introduce a real-time method that bridges the gap between the two approaches by directly utilizing the tri-plane representation introduced for EG3D generative model. In particular, we build upon a feed-forward convolutional encoder for the latent code and extend it with a fully-convolutional predictor of tri-plane numerical offsets. As shown in our work, the renderings are similar in quality to optimization-based techniques and significantly outperform the baselines for novel view. As we empirically prove, this is a consequence of directly operating in the tri-plane space, not in the GAN parameter space, while making use of an encoder-based trainable approach.
翻译:近期基于NeRF的生成对抗网络研究提出了多种高分辨率、高保真度的人头生成建模方法,并支持新视角渲染。与此同时,要实现现有图像或视频的重新渲染与编辑,必须解决逆问题。尽管基于优化的通用方法在二维生成对抗网络反转中取得显著成功,但将其应用于三维生成对抗网络时,可能无法生成三维一致的渲染结果。而基于编码器的快速技术(如针对StyleGAN开发的方法)由于难以保持身份特征,其吸引力有所降低。本研究提出一种实时方法,通过直接利用EG3D生成模型中引入的三平面表征,弥合上述两类方法的差距。具体而言,我们基于前馈卷积编码器构建潜在编码,并扩展为三平面数值偏移的全卷积预测器。实验表明,本方法渲染质量与基于优化的技术相当,且在新视角生成任务上显著优于基线模型。我们通过实证证明,这种优势源于直接在三维平面空间而非生成对抗网络参数空间中进行操作,同时采用了基于编码器的可训练方法。