In this paper, we address the challenge of decomposing Neural Radiance Fields (NeRF) into objects from an open vocabulary, a critical task for object manipulation in 3D reconstruction and view synthesis. Current techniques for NeRF decomposition involve a trade-off between the flexibility of processing open-vocabulary queries and the accuracy of 3D segmentation. We present, Open-vocabulary Embedded Neural Radiance Fields (Open-NeRF), that leverage large-scale, off-the-shelf, segmentation models like the Segment Anything Model (SAM) and introduce an integrate-and-distill paradigm with hierarchical embeddings to achieve both the flexibility of open-vocabulary querying and 3D segmentation accuracy. Open-NeRF first utilizes large-scale foundation models to generate hierarchical 2D mask proposals from varying viewpoints. These proposals are then aligned via tracking approaches and integrated within the 3D space and subsequently distilled into the 3D field. This process ensures consistent recognition and granularity of objects from different viewpoints, even in challenging scenarios involving occlusion and indistinct features. Our experimental results show that the proposed Open-NeRF outperforms state-of-the-art methods such as LERF \cite{lerf} and FFD \cite{ffd} in open-vocabulary scenarios. Open-NeRF offers a promising solution to NeRF decomposition, guided by open-vocabulary queries, enabling novel applications in robotics and vision-language interaction in open-world 3D scenes.
翻译:本文针对开放词汇下神经辐射场(NeRF)分解为物体这一挑战,这是三维重建与视图合成中物体操作的关键任务。当前NeRF分解技术在开放词汇查询处理的灵活性与三维分割精度之间存在权衡。我们提出开放词汇嵌入式神经辐射场(Open-NeRF),利用大规模现成分割模型(如Segment Anything Model, SAM),并引入集成-蒸馏范式与层次化嵌入,同时实现开放词汇查询的灵活性与三维分割精度。Open-NeRF首先利用大规模基础模型从不同视角生成层次化二维掩码提议;随后通过追踪方法对齐这些提议,在三维空间内进行集成,并蒸馏至三维场中。该流程确保不同视角下物体的一致识别与粒度,即使在涉及遮挡与特征模糊的挑战性场景中亦能保持。实验结果表明,所提Open-NeRF在开放词汇场景中优于LERF \cite{lerf}和FFD \cite{ffd}等最先进方法。Open-NeRF为受开放词汇查询引导的NeRF分解提供了有效方案,为开放世界三维场景中的机器人学与视觉-语言交互等新型应用创造了可能。