Reinforcement Learning (RL) plays a crucial role in advancing autonomous driving technologies by maximizing reward functions to achieve the optimal policy. However, crafting these reward functions has been a complex, manual process in many practices. To reduce this complexity, we introduce a novel framework that integrates Large Language Models (LLMs) with RL to improve reward function design in autonomous driving. This framework utilizes the coding capabilities of LLMs, proven in other areas, to generate and evolve reward functions for highway scenarios. The framework starts with instructing LLMs to create an initial reward function code based on the driving environment and task descriptions. This code is then refined through iterative cycles involving RL training and LLMs' reflection, which benefits from their ability to review and improve the output. We have also developed a specific prompt template to improve LLMs' understanding of complex driving simulations, ensuring the generation of effective and error-free code. Our experiments in a highway driving simulator across three traffic configurations show that our method surpasses expert handcrafted reward functions, achieving a 22% higher average success rate. This not only indicates safer driving but also suggests significant gains in development productivity.
翻译:强化学习(RL)通过最大化奖励函数以实现最优策略,在推动自动驾驶技术发展中发挥着关键作用。然而,在许多实践中,设计这些奖励函数一直是一个复杂且需要人工操作的过程。为降低此复杂性,我们提出了一种新颖框架,将大型语言模型(LLMs)与RL相结合,以改进自动驾驶中的奖励函数设计。该框架利用LLMs在其他领域已得到验证的代码生成能力,为高速公路场景生成并演化奖励函数。框架首先指导LLMs根据驾驶环境和任务描述生成初始的奖励函数代码。随后,通过结合RL训练与LLMs自我反思的迭代循环对该代码进行优化,这得益于LLMs审查和改进输出的能力。我们还开发了一种特定的提示模板,以增强LLMs对复杂驾驶仿真的理解,确保生成有效且无错误的代码。我们在高速公路驾驶模拟器中针对三种交通配置进行的实验表明,我们的方法超越了专家手工设计的奖励函数,实现了平均成功率22%的提升。这不仅意味着更安全的驾驶,也预示着开发效率的显著提高。