Mobile networks continue to grow in complexity and next generation networks are expected to support both increasing traffic loads and more diverse services. As network complexity rises, optimizing antenna parameters under dynamic or changing objectives becomes increasingly challenging. We propose a novel multi-agent reinforcement learning (MARL) algorithm for high-level control and orchestration of mobile networks. The Temporally Consistent Graph Q-Network (TC-GQN) algorithm learns a self-predicting representation of the whole network that is task-independent and aggregates information from all base-stations. A graph neural network is trained using a global reward function to assign coordinated local actions based on the learned encoding of the global network state. We evaluate the algorithm in a simulated environment to orchestrate an energy-saving feature across multiple sectors and multiple carriers under different quality of service (QoS) constraints. The proposed algorithm outperforms state-of-the-art graph-based baselines and a competitive rule-based controller by improving hardware sleep time while maintaining QoS. Moreover, the learned representation enables rapid adaptation to changing intents.
翻译:移动网络复杂性持续增长,下一代网络需同时应对日益增长的流量负载和更多样化的服务需求。随着网络复杂度提升,在动态或变化的优化目标下调整天线参数变得极具挑战性。我们提出了一种新颖的多智能体强化学习(MARL)算法,用于移动网络的高层级控制与编排。时序一致图表Q网络(TC-GQN)算法能够学习与任务无关的全网自预测表征,并从所有基站聚合信息。通过全局奖励函数训练的图神经网络,能基于所学到的全局网络状态编码,分配协同的局部动作。我们在模拟环境中评估该算法,使其在不同服务质量(QoS)约束下协调多扇区、多载波的节能特性。与基于图的最先进基线方法和具有竞争力的规则式控制器相比,本文算法在维持服务质量的同时提升硬件休眠时间,展现出更优性能。此外,所学表征能快速适应变化的意图。