Traffic signal control (TSC) is a challenging problem within intelligent transportation systems and has been tackled using multi-agent reinforcement learning (MARL). While centralized approaches are often infeasible for large-scale TSC problems, decentralized approaches provide scalability but introduce new challenges, such as partial observability. Communication plays a critical role in decentralized MARL, as agents must learn to exchange information using messages to better understand the system and achieve effective coordination. Deep MARL has been used to enable inter-agent communication by learning communication protocols in a differentiable manner. However, many deep MARL communication frameworks proposed for TSC allow agents to communicate with all other agents at all times, which can add to the existing noise in the system and degrade overall performance. In this study, we propose a communication-based MARL framework for large-scale TSC. Our framework allows each agent to learn a communication policy that dictates "which" part of the message is sent "to whom". In essence, our framework enables agents to selectively choose the recipients of their messages and exchange variable length messages with them. This results in a decentralized and flexible communication mechanism in which agents can effectively use the communication channel only when necessary. We designed two networks, a synthetic $4 \times 4$ grid network and a real-world network based on the Pasubio neighborhood in Bologna. Our framework achieved the lowest network congestion compared to related methods, with agents utilizing $\sim 47-65 \%$ of the communication channel. Ablation studies further demonstrated the effectiveness of the communication policies learned within our framework.
翻译:交通信号控制(TSC)是智能交通系统中一个具有挑战性的问题,目前已采用多智能体强化学习(MARL)进行解决。虽然集中式方法通常不适用于大规模TSC问题,但分散式方法虽然具备可扩展性,却引入了新的挑战,例如部分可观测性。通信在分散式MARL中起着关键作用,因为智能体必须学会利用消息交换信息,以更好地理解系统并实现有效协调。深度MARL通过以可微分方式学习通信协议,已被用于实现智能体间的通信。然而,针对TSC提出的许多深度MARL通信框架允许智能体始终与所有其他智能体进行通信,这可能会增加系统中现有的噪声,并降低整体性能。在本研究中,我们提出了一种基于通信的MARL框架,用于大规模TSC。我们的框架允许每个智能体学习一个通信策略,该策略决定了消息的“哪一部分”发送“给谁”。本质上,我们的框架使智能体能够有选择地选择其消息的接收者,并与他们交换可变长度的消息。这产生了一种分散且灵活的通信机制,智能体仅在必要时有效地使用通信信道。我们设计了两个网络:一个合成的$4 \times 4$网格网络和一个基于博洛尼亚Pasubio街区的真实世界网络。与相关方法相比,我们的框架实现了最低的网络拥堵,智能体仅使用约$47-65\%$的通信信道。消融研究进一步证明了我们框架内学习到的通信策略的有效性。