Panoptic maps enable robots to reason about both geometry and semantics. However, open-vocabulary models repeatedly produce closely related labels that split panoptic entities and degrade volumetric consistency. The proposed UPPM advances open-world scene understanding by leveraging foundation models to introduce a panoptic Dynamic Descriptor that reconciles open-vocabulary labels with unified category structure and geometric size priors. The fusion for such dynamic descriptors is performed within a multi-resolution multi-TSDF map using language-guided open-vocabulary panoptic segmentation and semantic retrieval, resulting in a persistent and promptable panoptic map without additional model training. Based on our evaluation experiments, UPPM shows the best overall performance in terms of the map reconstruction accuracy and the panoptic segmentation quality. The ablation study investigates the contribution for each component of UPPM (custom NMS, blurry-frame filtering, and unified semantics) to the overall system performance. Consequently, UPPM preserves open-vocabulary interpretability while delivering strong geometric and panoptic accuracy.
翻译:全景地图使机器人能够同时推理几何与语义信息。然而,开放词汇模型会重复生成高度相关的标签,导致全景实体分裂并破坏体素一致性。本文提出的UPPM通过利用基础模型引入全景动态描述符,将开放词汇标签与统一的类别结构及几何尺寸先验相融合,从而推进开放世界场景理解。此类动态描述符的融合在一个多分辨率多TSDF地图中实现,结合了语言引导的开放词汇全景分割与语义检索,最终生成无需额外模型训练的持久化可提示全景地图。根据我们的评估实验,UPPM在地图重建精度和全景分割质量方面均展现出最佳综合性能。消融研究探讨了UPPM各组件(定制非极大值抑制、模糊帧过滤和统一语义模块)对整体系统性能的贡献。因此,UPPM在保持开放词汇可解释性的同时,实现了优异的几何与全景精度。