Cooking recipes are challenging to translate to robot plans as they feature rich linguistic complexity, temporally-extended interconnected tasks, and an almost infinite space of possible actions. Our key insight is that combining a source of cooking domain knowledge with a formalism that captures the temporal richness of cooking recipes could enable the extraction of unambiguous, robot-executable plans. In this work, we use Linear Temporal Logic (LTL) as a formal language expressive enough to model the temporal nature of cooking recipes. Leveraging a pretrained Large Language Model (LLM), we present Cook2LTL, a system that translates instruction steps from an arbitrary cooking recipe found on the internet to a set of LTL formulae, grounding high-level cooking actions to a set of primitive actions that are executable by a manipulator in a kitchen environment. Cook2LTL makes use of a caching scheme that dynamically builds a queryable action library at runtime. We instantiate Cook2LTL in a realistic simulation environment (AI2-THOR), and evaluate its performance across a series of cooking recipes. We demonstrate that our system significantly decreases LLM API calls (-51%), latency (-59%), and cost (-42%) compared to a baseline that queries the LLM for every newly encountered action at runtime.
翻译:烹饪食谱因其丰富的语言复杂性、时间上扩展的互联任务以及几乎无限的可能动作空间,给将其转化为机器人规划带来了挑战。我们的关键洞察是,将烹饪领域知识源与能够捕捉食谱时间丰富性的形式化方法相结合,能够提取出明确且机器人可执行的规划。在这项工作中,我们使用线性时序逻辑(LTL)作为形式化语言,其表达力足以建模烹饪食谱的时间特性。借助预训练的大语言模型,我们提出了Cook2LTL系统,该系统能够将互联网上任意烹饪食谱中的指令步骤转化为一组LTL公式,并将高层烹饪动作具象化为一组可由厨房环境中机械臂执行的原子动作。Cook2LTL利用了一种缓存方案,该方案在运行时动态构建一个可查询的动作库。我们在逼真的仿真环境(AI2-THOR)中实例化Cook2LTL,并在一系列烹饪食谱上评估其性能。我们证明,与在运行时为每个新遇到的动作都查询大语言模型的基线相比,我们的系统显著减少了LLM API调用次数(-51%)、延迟(-59%)和成本(-42%)。