Training and deploying reinforcement learning (RL) policies for robots, especially in accomplishing specific tasks, presents substantial challenges. Recent advancements have explored diverse reward function designs, training techniques, simulation-to-reality (sim-to-real) transfers, and performance analysis methodologies, yet these still require significant human intervention. This paper introduces an end-to-end framework for training and deploying RL policies, guided by Large Language Models (LLMs), and evaluates its effectiveness on bipedal robots. The framework consists of three interconnected modules: an LLM-guided reward function design module, an RL training module leveraging prior work, and a sim-to-real homomorphic evaluation module. This design significantly reduces the need for human input by utilizing only essential simulation and deployment platforms, with the option to incorporate human-engineered strategies and historical data. We detail the construction of these modules, their advantages over traditional approaches, and demonstrate the framework's capability to autonomously develop and refine controlling strategies for bipedal robot locomotion, showcasing its potential to operate independently of human intervention.
翻译:训练和部署机器人强化学习策略,尤其是在完成特定任务方面,面临着巨大挑战。尽管近期研究在奖励函数设计、训练技术、仿真到现实迁移以及性能分析方法等方面取得了进展,但这些方法仍然需要大量的人工干预。本文提出了一种由大型语言模型引导的强化学习策略训练与部署端到端框架,并在双足机器人上评估其有效性。该框架由三个相互关联的模块组成:LLM引导的奖励函数设计模块、利用先前工作的RL训练模块以及仿真到现实同态评估模块。该设计通过仅使用必要的仿真和部署平台,并可选地结合人工设计的策略和历史数据,显著减少了对人工输入的需求。我们详细阐述了这些模块的构建方式、它们相对于传统方法的优势,并展示了该框架能够自主开发和优化双足机器人运动控制策略的能力,凸显了其独立于人工干预运行的潜力。