PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes

from arxiv, Project page: http://fuxiao0719.github.io/projects/panopticnerf360/ Code: https://github.com/fuxiao0719/PanopticNeRF/tree/panopticnerf360 (Minor Revision). arXiv admin note: text overlap with arXiv:2203.15224

Training perception systems for self-driving cars requires substantial 2D annotations that are labor-intensive to manual label. While existing datasets provide rich annotations on pre-recorded sequences, they fall short in labeling rarely encountered viewpoints, potentially hampering the generalization ability for perception models. In this paper, we present PanopticNeRF-360, a novel approach that combines coarse 3D annotations with noisy 2D semantic cues to generate high-quality panoptic labels and images from any viewpoint. Our key insight lies in exploiting the complementarity of 3D and 2D priors to mutually enhance geometry and semantics. Specifically, we propose to leverage coarse 3D bounding primitives and noisy 2D semantic and instance predictions to guide geometry optimization, by encouraging predicted labels to match panoptic pseudo ground truth. Simultaneously, the improved geometry assists in filtering 3D&2D annotation noise by fusing semantics in 3D space via a learned semantic field. To further enhance appearance, we combine MLP and hash grids to yield hybrid scene features, striking a balance between high-frequency appearance and contiguous semantics. Our experiments demonstrate PanopticNeRF-360's state-of-the-art performance over label transfer methods on the challenging urban scenes of the KITTI-360 dataset. Moreover, PanopticNeRF-360 enables omnidirectional rendering of high-fidelity, multi-view and spatiotemporally consistent appearance, semantic and instance labels. We make our code and data available at https://github.com/fuxiao0719/PanopticNeRF

翻译：训练自动驾驶汽车的感知系统需要大量二维标注，而人工标注劳动密集。现有数据集虽在预录制序列上提供了丰富标注，却难以标注罕见视角，可能制约感知模型的泛化能力。本文提出PanopticNeRF-360，一种将粗糙三维标注与含噪二维语义线索相结合的新方法，可生成任意视角下的高质量全景标注与图像。我们的核心思路在于利用三维与二维先验的互补性，以相互增强几何与语义信息。具体而言，我们提出利用粗糙三维包围体素与含噪的二维语义及实例预测来引导几何优化，通过促使预测标签与全景伪真值相匹配实现优化。同时，改进后的几何通过在学习得到的语义场中融合三维空间语义，有助于过滤三维与二维标注噪声。为进一步增强外观表现，我们结合MLP与哈希网格生成混合场景特征，在高频外观与连续语义间取得平衡。实验表明，在KITTI-360数据集的复杂城市场景上，PanopticNeRF-360在标签迁移方法中取得了最先进的性能。此外，PanopticNeRF-360能够实现高保真、多视角且时空一致的外观、语义及实例标签的全方位渲染。代码与数据公开于https://github.com/fuxiao0719/PanopticNeRF。