We study distributed control of networked systems through reinforcement learning, where neural policies must be simultaneously scalable, expressive and stabilizing. We introduce a policy parameterization that embeds Graph Neural Networks (GNNs) into a Youla-like magnitude-direction parameterization, yielding distributed stochastic controllers that guarantee network-level closed-loop stability by design. The magnitude is implemented as a stable operator consisting of a GNN acting on disturbance feedback, while the direction is a GNN acting on local observations. We prove robustness of the closed loop to perturbations in both the graph topology and model parameters, and show how to integrate our parameterization with Proximal Policy Optimization. Experiments on a multi-agent navigation task show that policies trained on small networks transfer directly to larger ones and unseen network topologies, achieve higher returns and lower variance than a state-of-the-art MARL baseline while preserving stability.
翻译:本研究通过强化学习探索网络化系统的分布式控制问题,其中神经网络策略需同时满足可扩展性、表达力与稳定性要求。我们提出一种策略参数化方法,将图神经网络嵌入类尤拉幅相参数化框架,从而构建出通过设计保证网络级闭环稳定性的分布式随机控制器。幅值模块由作用于扰动反馈的图神经网络实现稳定算子功能,而相位模块则由作用于局部观测的图神经网络构成。我们证明了闭环系统对图拓扑结构及模型参数摄动具有鲁棒性,并阐述了如何将该参数化方法与近端策略优化算法相结合。在多智能体导航任务上的实验表明:基于小型网络训练的策略可直接迁移至更大规模网络及未见过的拓扑结构,在保持稳定性的同时,相比最先进的多智能体强化学习基线方法获得更高回报与更低方差。