Pre-trained and frozen LLMs can effectively map simple scene re-arrangement instructions to programs over a robot's visuomotor functions through appropriate few-shot example prompting. To parse open-domain natural language and adapt to a user's idiosyncratic procedures, not known during prompt engineering time, fixed prompts fall short. In this paper, we introduce HELPER, an embodied agent equipped with an external memory of language-program pairs that parses free-form human-robot dialogue into action programs through retrieval-augmented LLM prompting: relevant memories are retrieved based on the current dialogue, instruction, correction or VLM description, and used as in-context prompt examples for LLM querying. The memory is expanded during deployment to include pairs of user's language and action plans, to assist future inferences and personalize them to the user's language and routines. HELPER sets a new state-of-the-art in the TEACh benchmark in both Execution from Dialog History (EDH) and Trajectory from Dialogue (TfD), with 1.7x improvement over the previous SOTA for TfD. Our models, code and video results can be found in our project's website: https://helper-agent-llm.github.io.
翻译:预训练并冻结的大语言模型可通过适当的少样本示例提示,有效将简单场景重排指令映射为机器人视觉运动功能的程序。对于解析开放域自然语言并适应提示工程阶段未知的用户个性化操作流程,固定提示存在不足。本文提出HELPER——一种配备外部语言-程序对记忆的具身智能体,通过检索增强的大语言模型提示解析自由形式的人机对话:基于当前对话、指令、修正或视觉语言模型描述检索相关记忆,并作为大语言模型查询的上下文示例提示。在部署过程中动态扩展记忆,纳入用户语言与动作计划的配对,以辅助未来推理并针对用户语言习惯实现个性化。HELPER在TEACh基准测试中,于对话历史执行与对话轨迹两项任务均达到新最优水平,后者相较先前最优方法提升1.7倍。我们的模型、代码与视频结果详见项目网站:https://helper-agent-llm.github.io