Evaluating the Robustness of Deep Reinforcement Learning for Autonomous Policies in a Multi-agent Urban Driving Environment

Deep reinforcement learning is actively used for training autonomous car policies in a simulated driving environment. Due to the large availability of various reinforcement learning algorithms and the lack of their systematic comparison across different driving scenarios, we are unsure of which ones are more effective for training autonomous car software in single-agent as well as multi-agent driving environments. A benchmarking framework for the comparison of deep reinforcement learning in a vision-based autonomous driving will open up the possibilities for training better autonomous car driving policies. To address these challenges, we provide an open and reusable benchmarking framework for systematic evaluation and comparative analysis of deep reinforcement learning algorithms for autonomous driving in a single- and multi-agent environment. Using the framework, we perform a comparative study of discrete and continuous action space deep reinforcement learning algorithms. We also propose a comprehensive multi-objective reward function designed for the evaluation of deep reinforcement learning-based autonomous driving agents. We run the experiments in a vision-only high-fidelity urban driving simulated environments. The results indicate that only some of the deep reinforcement learning algorithms perform consistently better across single and multi-agent scenarios when trained in various multi-agent-only environment settings. For example, A3C- and TD3-based autonomous cars perform comparatively better in terms of more robust actions and minimal driving errors in both single and multi-agent scenarios. We conclude that different deep reinforcement learning algorithms exhibit different driving and testing performance in different scenarios, which underlines the need for their systematic comparative analysis. The benchmarking framework proposed in this paper facilitates such a comparison.

翻译：深度强化学习正被广泛应用于模拟驾驶环境中训练自主汽车策略。由于各类强化学习算法众多且缺乏跨不同驾驶场景的系统性比较，我们难以确定哪些算法在单智能体及多智能体驾驶环境中对训练自主汽车软件更为有效。一个基于视觉的自主驾驶深度强化学习比较基准框架，将为训练更优的自主汽车驾驶策略开辟可能性。为应对这些挑战，我们提供了一个开放且可复用的基准框架，用于在单智能体和多智能体环境中对自主驾驶深度强化学习算法进行系统性评估与比较分析。利用该框架，我们对离散和连续动作空间深度强化学习算法开展了比较研究。同时，我们提出了一种面向深度强化学习自主驾驶智能体评估的综合多目标奖励函数。我们在仅依赖视觉的高保真城市驾驶模拟环境中进行了实验。结果表明，仅部分深度强化学习算法在多种多智能体环境设置下训练后，能在单智能体和多智能体场景中持续表现更优。例如，基于A3C和TD3的自主汽车在单智能体和多智能体场景下均展现出更鲁棒的动作和更少的驾驶错误。我们得出结论：不同深度强化学习算法在不同场景中表现出不同的驾驶与测试性能，这凸显了对其进行系统性比较分析的必要性。本文提出的基准框架为这种比较提供了便利。