Utilizing multi-view inputs to synthesize novel-view images, Neural Radiance Fields (NeRF) have emerged as a popular research topic in 3D vision. In this work, we introduce a Generalizable Semantic Neural Radiance Field (GSNeRF), which uniquely takes image semantics into the synthesis process so that both novel view images and the associated semantic maps can be produced for unseen scenes. Our GSNeRF is composed of two stages: Semantic Geo-Reasoning and Depth-Guided Visual rendering. The former is able to observe multi-view image inputs to extract semantic and geometry features from a scene. Guided by the resulting image geometry information, the latter performs both image and semantic rendering with improved performances. Our experiments not only confirm that GSNeRF performs favorably against prior works on both novel-view image and semantic segmentation synthesis but the effectiveness of our sampling strategy for visual rendering is further verified.
翻译:利用多视图输入合成新视角图像,神经辐射场(NeRF)已成为三维视觉领域的热门研究方向。本文提出一种可泛化的语义神经辐射场(GSNeRF),其独特之处在于将图像语义信息纳入合成过程,从而能为未见场景同时生成新视角图像及对应的语义图。GSNeRF由语义几何推理与深度引导视觉渲染两个阶段构成:前者能够观测多视图图像输入,从场景中提取语义与几何特征;后者在所得图像几何信息的引导下,进行图像与语义的联合渲染,显著提升合成质量。实验不仅证实GSNeRF在新视角图像合成与语义分割生成方面均优于现有方法,更验证了所提视觉渲染采样策略的有效性。