Training and deploying reinforcement learning (RL) policies for robots, especially in accomplishing specific tasks, presents substantial challenges. Recent advancements have explored diverse reward function designs, training techniques, simulation-to-reality (sim-to-real) transfers, and performance analysis methodologies, yet these still require significant human intervention. This paper introduces an end-to-end framework for training and deploying RL policies, guided by Large Language Models (LLMs), and evaluates its effectiveness on bipedal robots. The framework consists of three interconnected modules: an LLM-guided reward function design module, an RL training module leveraging prior work, and a sim-to-real homomorphic evaluation module. This design significantly reduces the need for human input by utilizing only essential simulation and deployment platforms, with the option to incorporate human-engineered strategies and historical data. We detail the construction of these modules, their advantages over traditional approaches, and demonstrate the framework's capability to autonomously develop and refine controlling strategies for bipedal robot locomotion, showcasing its potential to operate independently of human intervention.
翻译:训练和部署机器人强化学习策略,特别是在完成特定任务方面,面临着巨大挑战。最近的进展探索了多样化的奖励函数设计、训练技术、仿真到现实迁移以及性能分析方法,但这些仍然需要大量的人工干预。本文介绍了一种由大型语言模型引导的、用于训练和部署强化学习策略的端到端框架,并在双足机器人上评估了其有效性。该框架由三个相互关联的模块组成:一个LLM引导的奖励函数设计模块、一个利用先前工作的RL训练模块以及一个仿真到现实同态评估模块。该设计通过仅使用必要的仿真和部署平台,并可选地结合人工设计的策略和历史数据,显著减少了对人工输入的需求。我们详细阐述了这些模块的构建、它们相对于传统方法的优势,并展示了该框架能够自主开发和优化双足机器人运动控制策略的能力,彰显了其独立于人工干预运行的潜力。