In everyday life, frequently used objects like cups often have unfixed positions and multiple instances within the same category, and their carriers frequently change as well. As a result, it becomes challenging for a robot to efficiently navigate to a specific instance. To tackle this challenge, the robot must capture and update scene changes and plans continuously. However, current object navigation approaches primarily focus on semantic-level and lack the ability to dynamically update scene representation. This paper captures the relationships between frequently used objects and their static carriers. It constructs an open-vocabulary Carrier-Relationship Scene Graph (CRSG) and updates the carrying status during robot navigation to reflect the dynamic changes of the scene. Based on the CRSG, we further propose an instance navigation strategy that models the navigation process as a Markov Decision Process. At each step, decisions are informed by Large Language Model's commonsense knowledge and visual-language feature similarity. We designed a series of long-sequence navigation tasks for frequently used everyday items in the Habitat simulator. The results demonstrate that by updating the CRSG, the robot can efficiently navigate to moved targets. Additionally, we deployed our algorithm on a real robot and validated its practical effectiveness.
翻译:在日常生活中,诸如杯子等常用物品的位置通常不固定,同一类别存在多个实例,且其载体也频繁变化。因此,机器人如何高效导航至特定实例成为一个挑战。为应对这一挑战,机器人必须持续捕捉并更新场景变化与规划。然而,当前的对象导航方法主要聚焦于语义层面,缺乏动态更新场景表示的能力。本文捕捉了常用物品与其静态载体之间的关系,构建了一个开放词汇的载体关系场景图(CRSG),并在机器人导航过程中更新承载状态以反映场景的动态变化。基于CRSG,我们进一步提出了一种实例导航策略,将导航过程建模为马尔可夫决策过程。在每一步决策中,均利用大型语言模型的常识知识与视觉-语言特征相似性进行信息融合。我们在Habitat模拟器中为日常常用物品设计了一系列长序列导航任务。结果表明,通过更新CRSG,机器人能够高效导航至已移动的目标。此外,我们将算法部署于真实机器人上,验证了其实际有效性。