Multi-Agent Deep Reinforcement Learning for Cooperative and Competitive Autonomous Vehicles using AutoDRIVE Ecosystem

from arxiv, Accepted as Multi-Agent Dynamic Games (MAD-Games) Workshop Paper at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2023

This work presents a modular and parallelizable multi-agent deep reinforcement learning framework for imbibing cooperative as well as competitive behaviors within autonomous vehicles. We introduce AutoDRIVE Ecosystem as an enabler to develop physically accurate and graphically realistic digital twins of Nigel and F1TENTH, two scaled autonomous vehicle platforms with unique qualities and capabilities, and leverage this ecosystem to train and deploy multi-agent reinforcement learning policies. We first investigate an intersection traversal problem using a set of cooperative vehicles (Nigel) that share limited state information with each other in single as well as multi-agent learning settings using a common policy approach. We then investigate an adversarial head-to-head autonomous racing problem using a different set of vehicles (F1TENTH) in a multi-agent learning setting using an individual policy approach. In either set of experiments, a decentralized learning architecture was adopted, which allowed robust training and testing of the approaches in stochastic environments, since the agents were mutually independent and exhibited asynchronous motion behavior. The problems were further aggravated by providing the agents with sparse observation spaces and requiring them to sample control commands that implicitly satisfied the imposed kinodynamic as well as safety constraints. The experimental results for both problem statements are reported in terms of quantitative metrics and qualitative remarks for training as well as deployment phases.

翻译：本文提出了一种模块化且可并行化的多智能体深度强化学习框架，用于在自动驾驶车辆中注入协作与竞争行为。我们引入AutoDRIVE生态系统，作为构建Nigel和F1TENTH（两个具有独特特性和能力的缩比自动驾驶车辆平台）物理精确且图形逼真的数字孪生的使能器，并利用该生态系统训练与部署多智能体强化学习策略。首先，我们利用一组协作车辆（Nigel）研究交叉路口通行问题，这些车辆在单智能体及多智能体学习场景中采用共享策略方法相互交换有限状态信息。随后，我们利用另一组车辆（F1TENTH）在多智能体学习场景中采用独立策略方法研究对抗性一对一自动驾驶竞速问题。在两组实验中均采用去中心化学习架构，由于各智能体相互独立且具有异步运动行为，该架构使得方法能够在随机环境中进行鲁棒训练与测试。通过为智能体提供稀疏观测空间并要求其采样隐含满足运动学与动力学约束及安全约束的控制指令，进一步增加了问题难度。针对两项问题陈述的实验结果均以定量指标和定性评述的形式呈现，涵盖训练与部署阶段。