Autonomous driving has witnessed incredible advances in the past several decades, while Multi-Agent Reinforcement Learning (MARL) promises to satisfy the essential need of autonomous vehicle control in a wireless connected vehicle networks. In MARL, how to effectively decompose a global feedback into the relative contributions of individual agents belongs to one of the most fundamental problems. However, the environment volatility due to vehicle movement and wireless disturbance could significantly shape time-varying topological relationships among agents, thus making the Value Decomposition (VD) challenging. Therefore, in order to cope with this annoying volatility, it becomes imperative to design a dynamic VD framework. Hence, in this paper, we propose a novel Stochastic VMIX (SVMIX) methodology by taking account of dynamic topological features during the VD and incorporating the corresponding components into a multi-agent actor-critic architecture. In particular, Stochastic Graph Neural Network (SGNN) is leveraged to effectively capture underlying dynamics in topological features and improve the flexibility of VD against the environment volatility. Finally, the superiority of SVMIX is verified through extensive simulations.
翻译:自动驾驶在过去几十年中取得了令人瞩目的进展,而多智能体强化学习(MARL)有望满足无线连接车辆网络中自动驾驶控制的基本需求。在MARL中,如何有效地将全局反馈分解为各智能体的相对贡献是最基础的问题之一。然而,车辆移动和无线干扰导致的环境波动会显著改变智能体间的时变拓扑关系,从而使价值分解(VD)面临挑战。因此,为应对这种棘手的波动性,设计动态VD框架势在必行。为此,本文提出一种新颖的随机VMIX(SVMIX)方法,该方法在VD过程中考虑动态拓扑特征,并将相应组件融入多智能体演员-评论家架构中。具体而言,利用随机图神经网络(SGNN)有效捕捉拓扑特征的潜在动态变化,并提升VD对环境波动的适应性。最后,通过大量仿真验证了SVMIX的优越性。