A Scalable and Parallelizable Digital Twin Framework for Sustainable Sim2Real Transition of Multi-Agent Reinforcement Learning Systems

This work presents a sustainable multi-agent deep reinforcement learning framework capable of selectively scaling parallelized training workloads on-demand, and transferring the trained policies from simulation to reality using minimal hardware resources. We introduce AutoDRIVE Ecosystem as an enabling digital twin framework to train, deploy, and transfer cooperative as well as competitive multi-agent reinforcement learning policies from simulation to reality. Particularly, we first investigate an intersection traversal problem of 4 cooperative vehicles (Nigel) that share limited state information in single as well as multi-agent learning settings using a common policy approach. We then investigate an adversarial autonomous racing problem of 2 vehicles (F1TENTH) using an individual policy approach. In either set of experiments, a decentralized learning architecture was adopted, which allowed robust training and testing of the policies in stochastic environments. The agents were provided with realistically sparse observation spaces, and were restricted to sample control actions that implicitly satisfied the imposed kinodynamic and safety constraints. The experimental results for both problem statements are reported in terms of quantitative metrics and qualitative remarks for training as well as deployment phases. We also discuss agent and environment parallelization techniques adopted to efficiently accelerate MARL training, while analyzing their computational performance. Finally, we demonstrate a resource-aware transition of the trained policies from simulation to reality using the proposed digital twin framework.

翻译：本文提出了一种可持续的多智能体深度强化学习框架，该框架能够按需选择性地扩展并行化训练工作负载，并利用最少的硬件资源将训练好的策略从仿真迁移至现实。我们引入AutoDRIVE生态系统作为赋能数字孪生框架，用于训练、部署和迁移协作与竞争性多智能体强化学习策略——从仿真到现实。具体而言，我们首先研究了4个协作车辆（Nigel）在单一及多智能体学习设置下，采用共享策略方法并共享有限状态信息的十字路口通行问题。随后，我们采用独立策略方法，研究了2个车辆（F1TENTH）的对抗性自主赛车问题。在两套实验中，均采用了去中心化学习架构，使得策略能够在随机环境中进行稳健训练与测试。智能体被赋予符合现实的稀疏观测空间，并被限制在采样控制动作时隐式满足所施加的动力学与运动学及安全约束。针对这两个问题陈述的实验结果，我们从训练与部署阶段分别给出了定量指标与定性评述。我们还讨论了为高效加速多智能体强化学习训练所采用的智能体与环境并行化技术，并分析了其计算性能。最后，我们利用所提出的数字孪生框架，展示了训练策略从仿真到现实的资源感知迁移过程。