RACE-SM: Reinforcement Learning Based Autonomous Control for Social On-Ramp Merging

Autonomous parallel-style on-ramp merging in human controlled traffic continues to be an existing issue for autonomous vehicle control. Existing non-learning based solutions for vehicle control rely on rules and optimization primarily. These methods have been seen to present significant challenges. Recent advancements in Deep Reinforcement Learning have shown promise and have received significant academic interest however the available learning based approaches show inadequate attention to other highway vehicles and often rely on inaccurate road traffic assumptions. In addition, the parallel-style case is rarely considered. A novel learning based model for acceleration and lane change decision making that explicitly considers the utility to both the ego vehicle and its surrounding vehicles which may be cooperative or uncooperative to produce behaviour that is socially acceptable is proposed. The novel reward function makes use of Social Value Orientation to weight the vehicle's level of social cooperation and is divided into ego vehicle and surrounding vehicle utility which are weighted according to the model's designated Social Value Orientation. A two-lane highway with an on-ramp divided into a taper-style and parallel-style section is considered. Simulation results indicated the importance of considering surrounding vehicles in reward function design and show that the proposed model matches or surpasses those in literature in terms of collisions while also introducing socially courteous behaviour avoiding near misses and anti-social behaviour through direct consideration of the effect of merging on surrounding vehicles.

翻译：在有人驾驶交通环境中的自主平行式匝道合流仍是自动驾驶车辆控制领域的一个现存难题。现有的非学习型车辆控制解决方案主要依赖规则和优化方法，但这些方法已被证明存在显著挑战。近年来深度强化学习的进展展现出应用前景并受到学术界的广泛关注，然而现有基于学习的方法对高速公路其他车辆的关注度不足，且常依赖于不准确的道路交通假设。此外，平行式匝道合流场景鲜少被纳入考虑。本文提出了一种新型基于学习的加速度与车道变更决策模型，该模型明确考虑了自车及其周围车辆（可能具备合作或非合作特性）的效用，以生成符合社会规范的行为。新型奖励函数利用社会价值取向（Social Value Orientation）来量化车辆的社会合作程度，并将其划分为自车效用与周围车辆效用两部分，根据模型设定的社会价值取向进行加权。本文考虑了包含锥形段与平行段的两车道高速公路匝道场景。仿真结果表明，在奖励函数设计中考虑周围车辆至关重要；所提模型在碰撞指标上与现有文献持平或更优，同时通过直接考虑合流行为对周围车辆的影响，引入了社交礼节性行为，有效避免了近距离碰撞及反社会行为。