Coordinate-based implicit neural networks, or neural fields, have emerged as useful representations of shape and appearance in 3D computer vision. Despite advances, however, it remains challenging to build neural fields for categories of objects without datasets like ShapeNet that provide "canonicalized" object instances that are consistently aligned for their 3D position and orientation (pose). We present Canonical Field Network (CaFi-Net), a self-supervised method to canonicalize the 3D pose of instances from an object category represented as neural fields, specifically neural radiance fields (NeRFs). CaFi-Net directly learns from continuous and noisy radiance fields using a Siamese network architecture that is designed to extract equivariant field features for category-level canonicalization. During inference, our method takes pre-trained neural radiance fields of novel object instances at arbitrary 3D pose and estimates a canonical field with consistent 3D pose across the entire category. Extensive experiments on a new dataset of 1300 NeRF models across 13 object categories show that our method matches or exceeds the performance of 3D point cloud-based methods.
翻译:基于坐标的隐式神经网络(即神经场)已成为3D计算机视觉中形状与外观表示的重要工具。然而,尽管取得了进展,但在没有如ShapeNet等提供具有一致3D位置与方向(姿态)对齐的“规范化”对象实例的数据集时,为物体类别构建神经场仍具挑战。我们提出规范场网络(CaFi-Net),一种自监督方法,用于对表示为神经场(特别是神经辐射场NeRF)的对象类别实例进行3D姿态规范化。CaFi-Net通过设计用于提取类别级规范化的等变场特征的孪生网络架构,直接从连续且含噪声的辐射场中学习。在推理阶段,我们的方法采用任意3D姿态的新物体实例的预训练神经辐射场,并估计出整个类别中具有一致3D姿态的规范场。在包含13个物体类别共1300个NeRF模型的新数据集上的大量实验表明,我们的方法达到或超越了基于3D点云方法的性能。