Visual Semantic Navigation (VSN) is a fundamental problem in robotics, where an agent must navigate toward a target object in an unknown environment, mainly using visual information. Most state-of-the-art VSN models are trained in simulation environments, where rendered scenes of the real world are used, at best. These approaches typically rely on raw RGB data from the virtual scenes, which limits their ability to generalize to real-world environments due to domain adaptation issues. To tackle this problem, in this work, we propose SEMNAV, a novel approach that leverages semantic segmentation as the main visual input representation of the environment to enhance the agent's perception and decision-making capabilities. By explicitly incorporating this type of high-level semantic information, our model learns robust navigation policies that improve generalization across unseen environments, both in simulated and real world settings. We also introduce the SEMNAV dataset, a newly curated dataset designed for training semantic segmentation-aware navigation models like SEMNAV. Our approach is evaluated extensively in both simulated environments and with real-world robotic platforms. Experimental results demonstrate that SEMNAV outperforms existing state-of-the-art VSN models, achieving higher success rates in the Habitat 2.0 simulation environment, using the HM3D dataset. Furthermore, our real-world experiments highlight the effectiveness of semantic segmentation in mitigating the sim-to-real gap, making our model a promising solution for practical VSN-based robotic applications. The code and datasets are accessible at https://github.com/gramuah/semnav
翻译:视觉语义导航是机器人学中的一个基础问题,要求智能体主要利用视觉信息在未知环境中导航至目标物体。目前最先进的视觉语义导航模型大多在仿真环境中训练,这些环境至多使用真实世界的渲染场景。此类方法通常依赖虚拟场景的原始RGB数据,由于领域适应问题,其泛化到真实世界环境的能力受到限制。为解决这一问题,本文提出SEMNAV——一种创新方法,利用语义分割作为环境的主要视觉输入表示,以增强智能体的感知与决策能力。通过显式引入此类高层语义信息,我们的模型能够学习到鲁棒的导航策略,从而提升在仿真和真实场景中未见环境下的泛化性能。我们还发布了SEMNAV数据集,这是一个专门为训练如SEMNAV这类语义分割感知的导航模型而构建的新数据集。我们在仿真环境和真实机器人平台上对方法进行了全面评估。实验结果表明,在Habitat 2.0仿真环境中使用HM3D数据集时,SEMNAV优于现有最先进的视觉语义导航模型,取得了更高的成功率。此外,真实世界实验凸显了语义分割在缓解仿真到现实差距方面的有效性,使我们的模型成为基于视觉语义导航的实用机器人应用的有前景的解决方案。代码与数据集可通过 https://github.com/gramuah/semnav 获取。