Neural Radiance Fields (NeRF) coupled with GANs represent a promising direction in the area of 3D reconstruction from a single view, owing to their ability to efficiently model arbitrary topologies. Recent work in this area, however, has mostly focused on synthetic datasets where exact ground-truth poses are known, and has overlooked pose estimation, which is important for certain downstream applications such as augmented reality (AR) and robotics. We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available. Our approach recovers an SDF-parameterized 3D shape, pose, and appearance from a single image of an object, without exploiting multiple views during training. More specifically, we leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution which is then refined via optimization. Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios. We demonstrate state-of-the-art results on a variety of real and synthetic benchmarks.
翻译:神经辐射场(NeRF)与生成对抗网络(GAN)的结合,因其能够高效建模任意拓扑结构,在单视图三维重建领域展现出广阔前景。然而,当前该领域的研究主要集中于已知精确真实姿态的合成数据集,忽略了姿态估计这一对增强现实(AR)和机器人等下游应用至关重要的环节。本文针对自然图像(其真实姿态未知)提出了一种规范化的端到端重建框架。该方法无需在训练阶段利用多视角信息,即可从单张物体图像中恢复由有符号距离函数(SDF)参数化的三维形状、姿态与外观。具体而言,我们利用一个无条件的3D感知生成器,并采用混合反演策略:先由模型生成初始解,再通过优化进行精化。该框架仅需10步即可实现图像解渲染,适用于实际应用场景。我们在多个真实与合成基准数据集上展示了最先进的性能。