Graph-based environments pose unique challenges to multi-agent reinforcement learning. In decentralized approaches, agents operate within a given graph and make decisions based on partial or outdated observations. The size of the observed neighborhood limits the generalizability to different graphs and affects the reactivity of agents, the quality of the selected actions, and the communication overhead. This work focuses on generalizability and resolves the trade-off in observed neighborhood size with a continuous information flow in the whole graph. We propose a recurrent message-passing model that iterates with the environment's steps and allows nodes to create a global representation of the graph by exchanging messages with their neighbors. Agents receive the resulting learned graph observations based on their location in the graph. Our approach can be used in a decentralized manner at runtime and in combination with a reinforcement learning algorithm of choice. We evaluate our method across 1000 diverse graphs in the context of routing in communication networks and find that it enables agents to generalize and adapt to changes in the graph.
翻译:基于图的环境对多智能体强化学习提出了独特挑战。在分布式方法中,智能体在给定图中运行,并根据部分或过时的观测做出决策。观测邻域的大小限制了模型对不同图的泛化能力,同时影响智能体的响应性、所选动作的质量以及通信开销。本研究聚焦于泛化性,通过在整个图中实现连续信息流来解决观测邻域大小与性能间的权衡问题。我们提出了一种循环消息传递模型,该模型随环境步骤迭代,允许节点通过与其邻居交换消息来构建图的全局表示。智能体根据其在图中的位置接收由此产生的学习型图观测。该方法可在运行时以分布式方式使用,并可与任意选择的强化学习算法相结合。我们在通信网络路由场景中,针对1000个不同图进行了评估,结果表明该方法使智能体能够泛化并适应图结构的变化。