Optimizing traffic dynamics in an evolving transportation landscape is crucial, particularly in scenarios where autonomous vehicles (AVs) with varying levels of autonomy coexist with human-driven cars. This paper presents a novel approach to optimizing choices of AVs using Proximal Policy Optimization (PPO), a reinforcement learning algorithm. We learned a policy to minimize traffic jams (i.e., minimize the time to cross the scenario) and to minimize pollution in a roundabout in Milan, Italy. Through empirical analysis, we demonstrate that our approach can reduce time and pollution levels. Furthermore, we qualitatively evaluate the learned policy using a cutting-edge cockpit to assess its performance in near-real-world conditions. To gauge the practicality and acceptability of the policy, we conducted evaluations with human participants using the simulator, focusing on a range of metrics like traffic smoothness and safety perception. In general, our findings show that human-driven vehicles benefit from optimizing AVs dynamics. Also, participants in the study highlighted that the scenario with 80\% AVs is perceived as safer than the scenario with 20\%. The same result is obtained for traffic smoothness perception.
翻译:在不断演变的交通格局中优化交通动态至关重要,尤其是在具备不同自主程度的自动驾驶车辆(AVs)与人类驾驶车辆共存的场景下。本文提出了一种利用近端策略优化(PPO)这一强化学习算法来优化自动驾驶车辆选择的新方法。我们学习了一种策略,旨在最小化交通拥堵(即缩短穿越场景的时间)并降低意大利米兰某环形交叉路口的污染水平。通过实证分析,我们证明该方法能够减少通行时间和污染程度。此外,我们利用尖端驾驶舱对所学策略进行定性评估,以检验其在近现实条件下的表现。为衡量该策略的实用性和可接受性,我们邀请人类参与者使用模拟器进行评价,重点考察交通流畅度和安全感知等指标。总体而言,研究结果表明,人类驾驶车辆能从优化自动驾驶车辆动态中受益。同时,研究参与者强调,自动驾驶车辆占比80%的场景比占比20%的场景被认为更安全,交通流畅度感知方面也得出相同结论。