Although 3D-aware GANs based on neural radiance fields have achieved competitive performance, their applicability is still limited to objects or scenes with the ground-truths or prediction models for clearly defined canonical camera poses. To extend the scope of applicable datasets, we propose a novel 3D-aware GAN optimization technique through contrastive learning with implicit pose embeddings. To this end, we first revise the discriminator design and remove dependency on ground-truth camera poses. Then, to capture complex and challenging 3D scene structures more effectively, we make the discriminator estimate a high-dimensional implicit pose embedding from a given image and perform contrastive learning on the pose embedding. The proposed approach can be employed for the dataset, where the canonical camera pose is ill-defined because it does not look up or estimate camera poses. Experimental results show that our algorithm outperforms existing methods by large margins on the datasets with multiple object categories and inconsistent canonical camera poses.
翻译:尽管基于神经辐射场的3D感知生成对抗网络(GAN)已取得竞争性表现,但其应用仍局限于具有明确规范相机姿态真值或预测模型的物体或场景。为拓展适用数据集范围,我们提出一种通过隐式姿态嵌入对比学习的新型3D感知GAN优化技术。为此,我们首先改进判别器设计,消除其对真实相机姿态的依赖性。其次,为更有效捕捉复杂且具挑战性的3D场景结构,我们使判别器从给定图像中估计高维隐式姿态嵌入,并对该姿态嵌入进行对比学习。所提方法可应用于规范相机姿态难以定义(因无需查询或估计相机姿态)的数据集。实验表明,本算法在多物体类别及规范相机姿态不一致的数据集上,显著优于现有方法。