In the field of robotics and computer vision, efficient and accurate semantic mapping remains a significant challenge due to the growing demand for intelligent machines that can comprehend and interact with complex environments. Conventional panoptic mapping methods, however, are limited by predefined semantic classes, thus making them ineffective for handling novel or unforeseen objects. In response to this limitation, we introduce the Unified Promptable Panoptic Mapping (UPPM) method. UPPM utilizes recent advances in foundation models to enable real-time, on-demand label generation using natural language prompts. By incorporating a dynamic labeling strategy into traditional panoptic mapping techniques, UPPM provides significant improvements in adaptability and versatility while maintaining high performance levels in map reconstruction. We demonstrate our approach on real-world and simulated datasets. Results show that UPPM can accurately reconstruct scenes and segment objects while generating rich semantic labels through natural language interactions. A series of ablation experiments validated the advantages of foundation model-based labeling over fixed label sets.
翻译:在机器人与计算机视觉领域,随着对能理解并交互复杂环境的智能设备需求日益增长,高效且准确的语义建图仍是一项重大挑战。然而,传统全景建图方法受限于预定义的语义类别,难以处理新颖或未预见的物体。为此,我们提出统一可提示全景建图(UPPM)方法。UPPM利用基础模型的最新进展,通过自然语言提示实现实时按需标签生成。通过将动态标签策略融入传统全景建图技术,UPPM在保持高水准地图重建性能的同时,显著提升了适应性与多场景适用性。我们在真实与仿真数据集上验证了该方法。结果表明,UPPM不仅能通过自然语言交互生成丰富语义标签,还能准确重建场景并分割物体。一系列消融实验验证了基于基础模型的标签生成相比固定标签集具有明显优势。