Cook2LTL: Translating Cooking Recipes to LTL Formulae using Large Language Models

Cooking recipes are challenging to translate to robot plans as they feature rich linguistic complexity, temporally-extended interconnected tasks, and an almost infinite space of possible actions. Our key insight is that combining a source of cooking domain knowledge with a formalism that captures the temporal richness of cooking recipes could enable the extraction of unambiguous, robot-executable plans. In this work, we use Linear Temporal Logic (LTL) as a formal language expressive enough to model the temporal nature of cooking recipes. Leveraging a pretrained Large Language Model (LLM), we present Cook2LTL, a system that translates instruction steps from an arbitrary cooking recipe found on the internet to a set of LTL formulae, grounding high-level cooking actions to a set of primitive actions that are executable by a manipulator in a kitchen environment. Cook2LTL makes use of a caching scheme that dynamically builds a queryable action library at runtime. We instantiate Cook2LTL in a realistic simulation environment (AI2-THOR), and evaluate its performance across a series of cooking recipes. We demonstrate that our system significantly decreases LLM API calls (-51%), latency (-59%), and cost (-42%) compared to a baseline that queries the LLM for every newly encountered action at runtime.

翻译：烹饪食谱因其丰富的语言复杂性、时间上扩展的互联任务以及几乎无限的可能动作空间，给将其转化为机器人规划带来了挑战。我们的关键洞察是，将烹饪领域知识源与能够捕捉食谱时间丰富性的形式化方法相结合，能够提取出明确且机器人可执行的规划。在这项工作中，我们使用线性时序逻辑（LTL）作为形式化语言，其表达力足以建模烹饪食谱的时间特性。借助预训练的大语言模型，我们提出了Cook2LTL系统，该系统能够将互联网上任意烹饪食谱中的指令步骤转化为一组LTL公式，并将高层烹饪动作具象化为一组可由厨房环境中机械臂执行的原子动作。Cook2LTL利用了一种缓存方案，该方案在运行时动态构建一个可查询的动作库。我们在逼真的仿真环境（AI2-THOR）中实例化Cook2LTL，并在一系列烹饪食谱上评估其性能。我们证明，与在运行时为每个新遇到的动作都查询大语言模型的基线相比，我们的系统显著减少了LLM API调用次数（-51%）、延迟（-59%）和成本（-42%）。

相关内容

大语言模型

关注 67

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日