When instructing robots, users want to flexibly express constraints, refer to arbitrary landmarks, and verify robot behavior, while robots must disambiguate instructions into specifications and ground instruction referents in the real world. To address this problem, we propose Language Instruction grounding for Motion Planning (LIMP), an approach that enables robots to verifiably follow complex, open-ended instructions in real-world environments without prebuilt semantic maps. LIMP constructs a symbolic instruction representation that reveals the robot's alignment with an instructor's intended motives and affords the synthesis of correct-by-construction robot behaviors. We conduct a large-scale evaluation of LIMP on 150 instructions across five real-world environments, demonstrating its versatility and ease of deployment in diverse, unstructured domains. LIMP performs comparably to state-of-the-art baselines on standard open-vocabulary tasks and additionally achieves a 79\% success rate on complex spatiotemporal instructions, significantly outperforming baselines that only reach 38\%. See supplementary materials and demo videos at https://robotlimp.github.io
翻译:在向机器人下达指令时,用户希望能够灵活地表达约束条件、引用任意地标并验证机器人行为,而机器人则必须将指令消歧为具体规约,并将指令所指对象在现实世界中予以落地。为解决此问题,我们提出了面向运动规划的语言指令落地方法(LIMP),该方法使机器人能够在无需预建语义地图的现实环境中可验证地执行复杂、开放式的指令。LIMP构建了一种符号化指令表征,该表征能揭示机器人行为与指令发出者预期动机的吻合程度,并支持合成构造正确的机器人行为。我们在五个真实环境中的150条指令上对LIMP进行了大规模评估,证明了其在多样化非结构化场景中的通用性与易部署性。LIMP在标准开放词汇任务上的表现与最先进基线方法相当,同时在复杂时空指令上实现了79%的成功率,显著优于仅达到38%成功率的基线方法。补充材料与演示视频详见 https://robotlimp.github.io