Multi-agent reinforcement learning (MARL) for cyber-physical vehicle systems usually requires a significantly long training time due to their inherent complexity. Furthermore, deploying the trained policies in the real world demands a feature-rich environment along with multiple physical embodied agents, which may not be feasible due to monetary, physical, energy, or safety constraints. This work seeks to address these pain points by presenting a mixed-reality (MR) digital twin (DT) framework capable of: (i) boosting training speeds by selectively scaling parallelized simulation workloads on-demand, and (ii) immersing the MARL policies across hybrid simulation-to-reality (sim2real) experiments. The viability and performance of the proposed framework are highlighted through two representative use cases, which cover cooperative as well as competitive classes of MARL problems. We study the effect of: (i) agent and environment parallelization on training time, and (ii) systematic domain randomization on zero-shot sim2real transfer, across both case studies. Results indicate up to 76.3% reduction in training time with the proposed parallelization scheme and sim2real gap as low as 2.9% using the proposed deployment method.
翻译:针对信息物理车辆系统的多智能体强化学习(MARL)通常因其固有的复杂性而需要极长的训练时间。此外,将训练后的策略部署到现实世界不仅需要特征丰富的环境,还需配备多个物理实体智能体,这常受限于成本、物理条件、能源或安全约束而难以实现。本研究旨在通过提出一种混合现实(MR)数字孪生(DT)框架来解决这些痛点,该框架能够:(i)通过按需选择性扩展并行化仿真工作负载来加速训练进程;(ii)在多智能体强化学习策略的混合仿真到现实(sim2real)实验中实现策略的沉浸式部署。通过两个覆盖协作型与竞争型多智能体强化学习问题的典型用例,我们验证了所提框架的可行性与性能。我们在两个案例研究中重点分析了:(i)智能体与环境并行化对训练时间的影响,以及(ii)系统性领域随机化对零样本仿真到现实迁移的效果。实验结果表明,采用所提出的并行化方案可使训练时间最高减少76.3%,而通过所提出的部署方法实现的仿真到现实性能差距可低至2.9%。