We investigate the problem of wireless routing in integrated access backhaul (IAB) networks consisting of fiber-connected and wireless base stations and multiple users. The physical constraints of these networks prevent the use of a central controller, and base stations have limited access to real-time network conditions. We aim to maximize packet arrival ratio while minimizing their latency, for this purpose, we formulate the problem as a multi-agent partially observed Markov decision process (POMDP). To solve this problem, we develop a Relational Advantage Actor Critic (Relational A2C) algorithm that uses Multi-Agent Reinforcement Learning (MARL) and information about similar destinations to derive a joint routing policy on a distributed basis. We present three training paradigms for this algorithm and demonstrate its ability to achieve near-centralized performance. Our results show that Relational A2C outperforms other reinforcement learning algorithms, leading to increased network efficiency and reduced selfish agent behavior. To the best of our knowledge, this work is the first to optimize routing strategy for IAB networks.
翻译:我们研究了由光纤连接和无线基站及多个用户组成的综合接入回传(IAB)网络中的无线路由问题。这些网络的物理限制阻止了中央控制器的使用,并且基站对实时网络状态的访问有限。我们旨在最大化数据包到达率同时最小化其延迟,为此,我们将该问题建模为多智能体部分可观测马尔可夫决策过程(POMDP)。为解决此问题,我们开发了一种关系优势行动者-评论家(Relational A2C)算法,该算法利用多智能体强化学习(MARL)和关于相似目的地的信息,在分布式基础上推导出联合路由策略。我们为该算法提出了三种训练范式,并展示了其实现接近集中式性能的能力。我们的结果表明,Relational A2C优于其他强化学习算法,从而提高了网络效率并减少了自私智能体行为。据我们所知,这是首个针对IAB网络优化路由策略的工作。