Compositional representations of the world are a promising step towards enabling high-level scene understanding and efficient transfer to downstream tasks. Learning such representations for complex scenes and tasks remains an open challenge. Towards this goal, we introduce Neural Radiance Field Codebooks (NRC), a scalable method for learning object-centric representations through novel view reconstruction. NRC learns to reconstruct scenes from novel views using a dictionary of object codes which are decoded through a volumetric renderer. This enables the discovery of reoccurring visual and geometric patterns across scenes which are transferable to downstream tasks. We show that NRC representations transfer well to object navigation in THOR, outperforming 2D and 3D representation learning methods by 3.1% success rate. We demonstrate that our approach is able to perform unsupervised segmentation for more complex synthetic (THOR) and real scenes (NYU Depth) better than prior methods (29% relative improvement). Finally, we show that NRC improves on the task of depth ordering by 5.5% accuracy in THOR.
翻译:组合式世界表征是实现高级场景理解及其向下游任务高效迁移的重要一步。针对复杂场景与任务学习此类表征仍是一项开放挑战。为此,我们提出神经辐射场编码本(NRC)——一种通过新视角重建学习以对象为中心表征的可扩展方法。NRC利用对象编码词典学习从新视角重建场景,这些编码通过体积渲染器解码。这能够发现场景中重复出现的视觉与几何模式,并迁移至下游任务。实验表明,NRC表征可有效迁移至THOR环境中的目标导航任务,相较于2D与3D表征学习方法实现3.1%的成功率提升。我们证明该方法在更复杂的合成场景(THOR)与真实场景(NYU Depth)上,其无监督分割性能优于现有方法(相对提升29%)。此外,NRC在THOR场景中的深度排序任务上实现了5.5%的准确率提升。