Large language models offer new ways of empowering people to program robot applications-namely, code generation via prompting. However, the code generated by LLMs is susceptible to errors. This work reports a preliminary exploration that empirically characterizes common errors produced by LLMs in robot programming. We categorize these errors into two phases: interpretation and execution. In this work, we focus on errors in execution and observe that they are caused by LLMs being "forgetful" of key information provided in user prompts. Based on this observation, we propose prompt engineering tactics designed to reduce errors in execution. We then demonstrate the effectiveness of these tactics with three language models: ChatGPT, Bard, and LLaMA-2. Finally, we discuss lessons learned from using LLMs in robot programming and call for the benchmarking of LLM-powered end-user development of robot applications.
翻译:大语言模型为赋能用户编程机器人应用提供了新途径——即通过提示生成代码。然而,大语言模型生成的代码容易出错。本研究报告了一项初步探索,通过经验方法系统刻画了大语言模型在机器人编程中产生的常见错误类型。我们将这些错误分为两类:解释阶段错误和执行阶段错误。本研究重点关注执行阶段错误,发现这类错误源于大语言模型对用户提示中关键信息的"健忘"。基于此发现,我们提出了旨在减少执行错误的提示工程技术策略。随后,我们通过三种语言模型(ChatGPT、Bard和LLaMA-2)验证了这些策略的有效性。最后,我们总结了在机器人编程中使用大语言模型的经验教训,并呼吁建立基于大语言模型的机器人应用终端用户开发基准测试体系。