This paper presents the first significant object detection framework, NeRF-RPN, which directly operates on NeRF. Given a pre-trained NeRF model, NeRF-RPN aims to detect all bounding boxes of objects in a scene. By exploiting a novel voxel representation that incorporates multi-scale 3D neural volumetric features, we demonstrate it is possible to regress the 3D bounding boxes of objects in NeRF directly without rendering the NeRF at any viewpoint. NeRF-RPN is a general framework and can be applied to detect objects without class labels. We experimented NeRF-RPN with various backbone architectures, RPN head designs and loss functions. All of them can be trained in an end-to-end manner to estimate high quality 3D bounding boxes. To facilitate future research in object detection for NeRF, we built a new benchmark dataset which consists of both synthetic and real-world data with careful labeling and clean up. Code and dataset are available at https://github.com/lyclyc52/NeRF_RPN.
翻译:本文提出了首个基于NeRF直接操作的显著性物体检测框架——NeRF-RPN。给定预训练的NeRF模型,NeRF-RPN旨在检测场景中所有物体的边界框。通过利用一种融合多尺度三维神经体积特征的新型体素表示方法,我们证明了无需从任何视角渲染NeRF,即可直接回归出NeRF中物体的三维边界框。NeRF-RPN是一种通用框架,可应用于无类别标签的物体检测。我们采用多种骨干网络架构、RPN头部设计和损失函数对NeRF-RPN进行了实验,所有方案均能以端到端方式训练,从而估计出高质量的三维边界框。为促进NeRF物体检测领域的后续研究,我们构建了一个全新的基准数据集,该数据集包含经细致标注与清理的合成数据及真实世界数据。代码与数据集已开源至https://github.com/lyclyc52/NeRF_RPN。