The emergence of Large Language Models (LLMs) has improved the prospects for robotic tasks. However, existing benchmarks are still limited to single tasks with limited generalization capabilities. In this work, we introduce a comprehensive benchmark and an autonomous learning framework, RoboCoder aimed at enhancing the generalization capabilities of robots in complex environments. Unlike traditional methods that focus on single-task learning, our research emphasizes the development of a general-purpose robotic coding algorithm that enables robots to leverage basic skills to tackle increasingly complex tasks. The newly proposed benchmark consists of 80 manually designed tasks across 7 distinct entities, testing the models' ability to learn from minimal initial mastery. Initial testing revealed that even advanced models like GPT-4 could only achieve a 47% pass rate in three-shot scenarios with humanoid entities. To address these limitations, the RoboCoder framework integrates Large Language Models (LLMs) with a dynamic learning system that uses real-time environmental feedback to continuously update and refine action codes. This adaptive method showed a remarkable improvement, achieving a 36% relative improvement. Our codes will be released.
翻译:大规模语言模型(LLMs)的出现提升了机器人任务的潜力。然而,现有基准仍局限于单一任务,且泛化能力有限。本研究提出一个综合基准与自主学习框架RoboCoder,旨在增强机器人在复杂环境中的泛化能力。与传统专注于单任务学习的方法不同,我们的研究强调开发通用机器人编码算法,使机器人能够利用基础技能处理日益复杂的任务。新提出的基准包含7个不同实体的80个手动设计任务,测试模型从最小初始掌握程度中学习的能力。初步测试显示,即使像GPT-4这样的先进模型,在针对人形实体的三次示例场景中也仅能达到47%的通过率。为解决这些限制,RoboCoder框架将大规模语言模型(LLMs)与动态学习系统相结合,利用实时环境反馈持续更新和优化动作代码。这种自适应方法实现了显著改进,相对提升36%。我们的代码将公开发布。