Natural language provides a powerful modality to program robots to perform temporal tasks. Linear temporal logic (LTL) provides unambiguous semantics for formal descriptions of temporal tasks. However, existing approaches cannot accurately and robustly translate English sentences to their equivalent LTL formulas in unseen environments. To address this problem, we propose Lang2LTL, a novel modular system that leverages pretrained large language models to first extract referring expressions from a natural language command, then ground the expressions to real-world landmarks and objects, and finally translate the command into an LTL task specification for the robot. It enables any robotic system to interpret natural language navigation commands without additional training, provided that it tracks its position and has a semantic map with landmarks labeled with free-form text. We demonstrate the state-of-the-art ability to generalize to multi-scale navigation domains such as OpenStreetMap (OSM) and CleanUp World (a simulated household environment). Lang2LTL achieves an average accuracy of 88.4% in translating challenging LTL formulas in 22 unseen OSM environments as evaluated on a new corpus of over 10,000 commands, 22 times better than the previous SoTA. Without modification, the best performing Lang2LTL model on the OSM dataset can translate commands in CleanUp World with 82.8% accuracy. As a part of our proposed comprehensive evaluation procedures, we collected a new labeled dataset of English commands representing 2,125 unique LTL formulas, the largest ever dataset of natural language commands to LTL specifications for robotic tasks with the most diverse LTL formulas, 40 times more than previous largest dataset. Finally, we integrated Lang2LTL with a planner to command a quadruped mobile robot to perform multi-step navigational tasks in an analog real-world environment created in the lab.
翻译:自然语言为编程机器人执行时序任务提供了强大的模态。线性时序逻辑(LTL)为时序任务的形式化描述提供了无歧义的语义。然而,现有方法无法在未知环境中准确且鲁棒地将英语句子翻译为等价的LTL公式。为解决这一问题,我们提出Lang2LTL——一种新型模块化系统,它利用预训练大语言模型首先从自然语言指令中提取指代表达,随后将这些表达与真实世界地标和对象进行关联,最终将指令翻译为机器人的LTL任务规约。该系统使任何机器人(只要具备位置追踪能力及一张以自由文本标注地标的语义地图)无需额外训练即可解释自然语言导航指令。我们展示了其在多尺度导航领域(如OpenStreetMap(OSM)和CleanUp World(模拟家庭环境))中具有最优泛化能力。在包含超过1万条指令的新语料库评估中,Lang2LTL在22个未知OSM环境下翻译具挑战性LTL公式的平均准确率达88.4%,较先前最优方法提升22倍。无需修改,在OSM数据集上性能最优的Lang2LTL模型即可在CleanUp World中实现82.8%的指令翻译准确率。作为我们提出的综合评估流程的一部分,我们收集了包含2125个独特LTL公式的英语指令标注数据集——这是迄今规模最大、LTL公式多样性最丰富的自然语言指令到机器人任务LTL规约数据集,其多样性较此前最大数据集提升40倍。最终,我们将Lang2LTL与规划器集成,指挥四足移动机器人在实验室构建的模拟真实环境中执行多步骤导航任务。