Multi-connectivity involves dynamic cluster formation among distributed access points (APs) and coordinated resource allocation from these APs, highlighting the need for efficient mobility management strategies for users with multi-connectivity. In this paper, we propose a novel mobility management scheme for unmanned aerial vehicles (UAVs) that uses dynamic cluster reconfiguration with energy-efficient power allocation in a wireless interference network. Our objective encompasses meeting stringent reliability demands, minimizing joint power consumption, and reducing the frequency of cluster reconfiguration. To achieve these objectives, we propose a hierarchical multi-agent deep reinforcement learning (H-MADRL) framework, specifically tailored for dynamic clustering and power allocation. The edge cloud connected with a set of APs through low latency optical back-haul links hosts the high-level agent responsible for the optimal clustering policy, while low-level agents reside in the APs and are responsible for the power allocation policy. To further improve the learning efficiency, we propose a novel action-observation transition-driven learning algorithm that allows the low-level agents to use the action space from the high-level agent as part of the local observation space. This allows the lower-level agents to share partial information about the clustering policy and allocate the power more efficiently. The simulation results demonstrate that our proposed distributed algorithm achieves comparable performance to the centralized algorithm. Additionally, it offers better scalability, as the decision time for clustering and power allocation increases by only 10% when doubling the number of APs, compared to a 90% increase observed with the centralized approach.
翻译:多连接性涉及分布式接入点间的动态集群形成以及这些接入点的协调资源分配,这突显了对具有多连接性用户的高效移动性管理策略的需求。本文针对无人机提出了一种新颖的移动性管理方案,该方案在无线干扰网络中采用动态集群重构与高能效功率分配。我们的目标包括满足严格的可靠性要求、最小化联合功耗以及降低集群重构频率。为实现这些目标,我们提出了一个专门为动态聚类和功率分配定制的分层多智能体深度强化学习框架。通过低延迟光回程链路连接一组接入点的边缘云托管负责最优聚类策略的高层智能体,而低层智能体驻留在接入点中,负责功率分配策略。为进一步提高学习效率,我们提出了一种新颖的动作-观察转换驱动学习算法,该算法允许低层智能体将高层智能体的动作空间用作局部观察空间的一部分。这使得低层智能体能够共享关于聚类策略的部分信息,从而更有效地分配功率。仿真结果表明,我们提出的分布式算法实现了与集中式算法相当的性能。此外,它提供了更好的可扩展性,因为当接入点数量翻倍时,聚类和功率分配的决策时间仅增加10%,而集中式方法则观察到90%的增加。