This paper investigates the potential of enhancing Neural Radiance Fields (NeRF) with semantics to expand their applications. Although NeRF has been proven useful in real-world applications like VR and digital creation, the lack of semantics hinders interaction with objects in complex scenes. We propose to imitate the backbone feature of off-the-shelf perception models to achieve zero-shot semantic segmentation with NeRF. Our framework reformulates the segmentation process by directly rendering semantic features and only applying the decoder from perception models. This eliminates the need for expensive backbones and benefits 3D consistency. Furthermore, we can project the learned semantics onto extracted mesh surfaces for real-time interaction. With the state-of-the-art Segment Anything Model (SAM), our framework accelerates segmentation by 16 times with comparable mask quality. The experimental results demonstrate the efficacy and computational advantages of our approach. Project page: \url{https://me.kiui.moe/san/}.
翻译:本文研究了增强神经辐射场(NeRF)语义能力的潜力,以扩展其应用场景。尽管NeRF在虚拟现实和数字创作等实际应用中已被证明有效,但缺乏语义信息阻碍了在复杂场景中与物体的交互。我们提出模仿现成感知模型的骨干特征,实现NeRF的零样本语义分割。我们的框架通过直接渲染语义特征并仅应用感知模型的解码器来重新定义分割过程,从而消除了昂贵骨干网络的需求,并有利于三维一致性。此外,我们可以将学习到的语义投影到提取的网格表面上,实现实时交互。结合最先进的Segment Anything Model(SAM),我们的框架在保持相当掩码质量的同时将分割速度提升了16倍。实验结果证明了我们方法的有效性和计算优势。项目页面:\url{https://me.kiui.moe/san/}。