As a preliminary work, NeRF-Det unifies the tasks of novel view synthesis and 3D perception, demonstrating that perceptual tasks can benefit from novel view synthesis methods like NeRF, significantly improving the performance of indoor multi-view 3D object detection. Using the geometry MLP of NeRF to direct the attention of detection head to crucial parts and incorporating self-supervised loss from novel view rendering contribute to the achieved improvement. To better leverage the notable advantages of the continuous representation through neural rendering in space, we introduce a novel 3D perception network structure, NeRF-DetS. The key component of NeRF-DetS is the Multi-level Sampling-Adaptive Network, making the sampling process adaptively from coarse to fine. Also, we propose a superior multi-view information fusion method, known as Multi-head Weighted Fusion. This fusion approach efficiently addresses the challenge of losing multi-view information when using arithmetic mean, while keeping low computational costs. NeRF-DetS outperforms competitive NeRF-Det on the ScanNetV2 dataset, by achieving +5.02% and +5.92% improvement in [email protected] and [email protected], respectively.
翻译:作为前期工作,NeRF-Det统一了新视角合成与3D感知任务,证明感知任务可受益于NeRF等新视角合成方法,显著提升室内多视角3D目标检测性能。通过利用NeRF的几何MLP将检测头注意力引导至关键区域,并引入新视角渲染的自监督损失,实现了性能提升。为进一步利用神经渲染在空间中连续表示的显著优势,我们提出新型3D感知网络结构NeRF-DetS。其核心组件为多层级采样自适应网络,使采样过程实现从粗到细的自适应调整。同时提出一种优越的多视角信息融合方法——多头加权融合。该融合方法在保持低计算成本的同时,有效解决了算术平均法导致的多视角信息丢失问题。在ScanNetV2数据集上,NeRF-DetS相比竞品NeRF-Det性能更优,[email protected]和[email protected]分别提升+5.02%和+5.92%。