Reinforcement learning has emerged as an important approach for autonomous driving. A reward function is used in reinforcement learning to establish the learned skill objectives and guide the agent toward the optimal policy. Since autonomous driving is a complex domain with partly conflicting objectives with varying degrees of priority, developing a suitable reward function represents a fundamental challenge. This paper aims to highlight the gap in such function design by assessing different proposed formulations in the literature and dividing individual objectives into Safety, Comfort, Progress, and Traffic Rules compliance categories. Additionally, the limitations of the reviewed reward functions are discussed, such as objectives aggregation and indifference to driving context. Furthermore, the reward categories are frequently inadequately formulated and lack standardization. This paper concludes by proposing future research that potentially addresses the observed shortcomings in rewards, including a reward validation framework and structured rewards that are context-aware and able to resolve conflicts.
翻译:强化学习已成为自动驾驶领域的重要方法。在强化学习中,奖励函数用于设定学习技能目标并引导智能体走向最优策略。由于自动驾驶是一个复杂领域,其目标部分冲突且优先级各异,设计合适的奖励函数构成了一项根本性挑战。本文旨在通过评估文献中提出的不同奖励函数表述,并将具体目标划分为安全性、舒适性、行驶效率及交通规则遵守四大类别,以揭示当前函数设计中的不足。此外,本文讨论了现有奖励函数的局限性,例如目标聚合方式及对驾驶情境的忽视。同时,现有奖励类别常存在表述不充分且缺乏标准化的问题。最后,本文对未来研究方向提出建议,包括构建奖励验证框架以及设计具有情境感知能力、能解决目标冲突的结构化奖励函数,以应对当前奖励设计中存在的缺陷。