To make robots accessible to a broad audience, it is critical to endow them with the ability to take universal modes of communication, like commands given in natural language, and extract a concrete desired task specification, defined using a formal language like linear temporal logic (LTL). In this paper, we present a learning-based approach for translating from natural language commands to LTL specifications with very limited human-labeled training data. This is in stark contrast to existing natural-language to LTL translators, which require large human-labeled datasets, often in the form of labeled pairs of LTL formulas and natural language commands, to train the translator. To reduce reliance on human data, our approach generates a large synthetic training dataset through algorithmic generation of LTL formulas, conversion to structured English, and then exploiting the paraphrasing capabilities of modern large language models (LLMs) to synthesize a diverse corpus of natural language commands corresponding to the LTL formulas. We use this generated data to finetune an LLM and apply a constrained decoding procedure at inference time to ensure the returned LTL formula is syntactically correct. We evaluate our approach on three existing LTL/natural language datasets and show that we can translate natural language commands at 75\% accuracy with far less human data ($\le$12 annotations). Moreover, when training on large human-annotated datasets, our method achieves higher test accuracy (95\% on average) than prior work. Finally, we show the translated formulas can be used to plan long-horizon, multi-stage tasks on a 12D quadrotor.
翻译:为了让机器人能够被广泛用户使用,关键是要赋予它们理解通用交流模式(如自然语言指令)的能力,并能提取用形式化语言(如线性时序逻辑,LTL)定义的具体期望任务规约。本文提出了一种基于学习的方法,能够在极少量人工标注训练数据的情况下,将自然语言指令翻译为LTL规约。这与现有的自然语言到LTL翻译器形成鲜明对比,后者需要大规模人工标注数据集(通常以LTL公式与自然语言指令的配对形式)来训练翻译器。为了减少对人工数据的依赖,我们的方法通过算法生成LTL公式、将其转换为结构化英语,并利用现代大型语言模型(LLMs)的释义能力,合成与LTL公式对应的多样化自然语言指令语料库。我们使用生成的微调数据对LLM进行训练,并在推理时应用约束解码过程,以确保返回的LTL公式在语法上正确。我们在三个现有的LTL/自然语言数据集上评估了该方法,结果表明,我们能够以75%的准确率翻译自然语言指令,且所需人工数据极少(≤12条标注)。此外,当使用大规模人工标注数据集训练时,我们的方法实现了比先前工作更高的测试准确率(平均95%)。最后,我们展示了翻译后的公式可用于在12自由度四旋翼飞行器上规划长时域、多阶段任务。