RACE-SM: Reinforcement Learning Based Autonomous Control for Social On-Ramp Merging

Autonomous parallel-style on-ramp merging in human controlled traffic continues to be an existing issue for autonomous vehicle control. Existing non-learning based solutions for vehicle control rely on rules and optimization primarily. These methods have been seen to present significant challenges. Recent advancements in Deep Reinforcement Learning have shown promise and have received significant academic interest however the available learning based approaches show inadequate attention to other highway vehicles and often rely on inaccurate road traffic assumptions. In addition, the parallel-style case is rarely considered. A novel learning based model for acceleration and lane change decision making that explicitly considers the utility to both the ego vehicle and its surrounding vehicles which may be cooperative or uncooperative to produce behaviour that is socially acceptable is proposed. The novel reward function makes use of Social Value Orientation to weight the vehicle's level of social cooperation and is divided into ego vehicle and surrounding vehicle utility which are weighted according to the model's designated Social Value Orientation. A two-lane highway with an on-ramp divided into a taper-style and parallel-style section is considered. Simulation results indicated the importance of considering surrounding vehicles in reward function design and show that the proposed model matches or surpasses those in literature in terms of collisions while also introducing socially courteous behaviour avoiding near misses and anti-social behaviour through direct consideration of the effect of merging on surrounding vehicles.

翻译：在人类驾驶的交通环境中，自主式平行型匝道合流仍是自动驾驶车辆控制面临的现存挑战。现有的非学习型车辆控制方案主要依赖规则与优化，但这些方法存在显著局限性。近年来深度强化学习的进展展现出巨大潜力并引发学术界的广泛关注，然而现有基于学习的方法对高速公路上其他车辆的关注不足，且常依赖不准确的道路交通假设。此外，平行型合流场景鲜少被纳入研究。本文提出一种新颖的基于学习模型，用于加速与车道变换决策，该模型明确考虑了自车与其他合作或非合作车辆的效用，从而生成具有社会可接受性的行为。创新性的奖励函数采用社会价值取向加权车辆的社交合作程度，并将其分为自车效用与周边车辆效用两部分，依据模型设定的社会价值取向进行加权。研究考虑了一条包含锥形段与平行段的两车道高速公路匝道场景。仿真结果表明，在奖励函数设计中考虑周边车辆至关重要，且所提模型在碰撞指标上与既有文献持平或更优，同时通过直接考虑合流行为对周边车辆的影响，引入了礼貌性社交行为，有效规避了近距离冲突与反社会行为。