Graph-based environments pose unique challenges to multi-agent reinforcement learning. In decentralized approaches, agents operate within a given graph and make decisions based on partial or outdated observations. The size of the observed neighborhood limits the generalizability to different graphs and affects the reactivity of agents, the quality of the selected actions, and the communication overhead. This work focuses on generalizability and resolves the trade-off in observed neighborhood size with a continuous information flow in the whole graph. We propose a recurrent message-passing model that iterates with the environment's steps and allows nodes to create a global representation of the graph by exchanging messages with their neighbors. Agents receive the resulting learned graph observations based on their location in the graph. Our approach can be used in a decentralized manner at runtime and in combination with a reinforcement learning algorithm of choice. We evaluate our method across 1000 diverse graphs in the context of routing in communication networks and find that it enables agents to generalize and adapt to changes in the graph.
翻译:基于图的环境对多智能体强化学习提出了独特挑战。在去中心化方法中,智能体在给定图内运行,并基于局部或过时观测做出决策。观测邻域的大小限制了其在不同图间的泛化能力,并影响智能体的反应速度、所选动作的质量以及通信开销。本研究聚焦于泛化性问题,通过在整个图中建立连续信息流来解决观测邻域大小的权衡问题。我们提出一种循环消息传递模型,该模型随环境步骤迭代更新,允许节点通过与邻居交换消息来构建图的全局表征。智能体根据其在图中的位置接收由此学习得到的图观测结果。我们的方法可在运行时以去中心化方式使用,并能与任意强化学习算法结合。我们在通信网络路由场景下对1000个多样化图进行评估,发现该方法能使智能体实现泛化并适应图结构的变化。