Ultra-reliable low latency communications (URLLC) service is envisioned to enable use cases with strict reliability and latency requirements in 5G. One approach for enabling URLLC services is to leverage Reinforcement Learning (RL) to efficiently allocate wireless resources. However, with conventional RL methods, the decision variables (though being deployed at various network layers) are typically optimized in the same control loop, leading to significant practical limitations on the control loop's delay as well as excessive signaling and energy consumption. In this paper, we propose a multi-agent Hierarchical RL (HRL) framework that enables the implementation of multi-level policies with different control loop timescales. Agents with faster control loops are deployed closer to the base station, while the ones with slower control loops are at the edge or closer to the core network providing high-level guidelines for low-level actions. On a use case from the prior art, with our HRL framework, we optimized the maximum number of retransmissions and transmission power of industrial devices. Our extensive simulation results on the factory automation scenario show that the HRL framework achieves better performance as the baseline single-agent RL method, with significantly less overhead of signal transmissions and delay compared to the one-agent RL methods.
翻译:超可靠低延迟通信(URLLC)业务旨在满足5G中具有严格可靠性与时延要求的应用场景。实现URLLC服务的途径之一是利用强化学习(RL)高效分配无线资源。然而在传统RL方法中,决策变量(尽管部署在网络各层)通常在统一控制环路内优化,这导致控制环路时延面临显著实际限制,并产生过多信令开销与能耗。本文提出一种多智能体分层强化学习(HRL)框架,可实现具有不同控制环路时间尺度的多级策略。具有较快控制环路的智能体部署在更靠近基站的位置,而较慢控制环路的智能体则位于边缘或核心网络侧,为底层动作提供高层指导。在现有技术案例中,通过我们的HRL框架优化了工业设备的最大重传次数与发射功率。基于工厂自动化场景的仿真结果表明,与单智能体RL基准方法相比,HRL框架在显著降低信令传输开销与延迟的同时实现了更优性能。