Uncertainty-Aware DRL for Autonomous Vehicle Crowd Navigation in Shared Space

Safe, socially compliant, and efficient navigation of low-speed autonomous vehicles (AVs) in pedestrian-rich environments necessitates considering pedestrians' future positions and interactions with the vehicle and others. Despite the inevitable uncertainties associated with pedestrians' predicted trajectories due to their unobserved states (e.g., intent), existing deep reinforcement learning (DRL) algorithms for crowd navigation often neglect these uncertainties when using predicted trajectories to guide policy learning. This omission limits the usability of predictions when diverging from ground truth. This work introduces an integrated prediction and planning approach that incorporates the uncertainties of predicted pedestrian states in the training of a model-free DRL algorithm. A novel reward function encourages the AV to respect pedestrians' personal space, decrease speed during close approaches, and minimize the collision probability with their predicted paths. Unlike previous DRL methods, our model, designed for AV operation in crowded spaces, is trained in a novel simulation environment that reflects realistic pedestrian behaviour in a shared space with vehicles. Results show a 40% decrease in collision rate and a 15% increase in minimum distance to pedestrians compared to the state of the art model that does not account for prediction uncertainty. Additionally, the approach outperforms model predictive control methods that incorporate the same prediction uncertainties in terms of both performance and computational time, while producing trajectories closer to human drivers in similar scenarios.

翻译：低速自动驾驶车辆（AVs）在行人密集环境中的安全、社会合规且高效的导航，必须考虑行人未来的位置及其与车辆和其他行人的交互。尽管由于行人未观测状态（如意图）导致其预测轨迹存在不可避免的不确定性，现有用于人群导航的深度强化学习（DRL）算法在使用预测轨迹指导策略学习时，往往忽略了这些不确定性。这种忽略限制了预测结果在偏离真实情况时的可用性。本研究提出了一种集成的预测与规划方法，在训练无模型DRL算法时纳入了预测行人状态的不确定性。一种新颖的奖励函数鼓励自动驾驶车辆尊重行人的个人空间，在近距离接近时降低速度，并最小化与其预测路径的碰撞概率。与以往的DRL方法不同，我们为拥挤空间中自动驾驶车辆操作设计的模型，是在一个新颖的仿真环境中进行训练的，该环境反映了行人在车辆共享空间中的真实行为。结果显示，与未考虑预测不确定性的最先进模型相比，碰撞率降低了40%，与行人的最小距离增加了15%。此外，该方法在性能和计算时间方面均优于纳入相同预测不确定性的模型预测控制方法，同时生成的轨迹在类似场景下更接近人类驾驶员的轨迹。