In the past few years, DRL has become a valuable solution to automatically learn efficient resource management strategies in complex networks with time-varying statistics. However, the increased complexity of 5G and Beyond networks requires correspondingly more complex learning agents and the learning process itself might end up competing with users for communication and computational resources. This creates friction: on the one hand, the learning process needs resources to quickly convergence to an effective strategy; on the other hand, the learning process needs to be efficient, i.e., take as few resources as possible from the user's data plane, so as not to throttle users' QoS. In this paper, we investigate this trade-off and propose a dynamic strategy to balance the resources assigned to the data plane and those reserved for learning. With the proposed approach, a learning agent can quickly converge to an efficient resource allocation strategy and adapt to changes in the environment as for the CL paradigm, while minimizing the impact on the users' QoS. Simulation results show that the proposed method outperforms static allocation methods with minimal learning overhead, almost reaching the performance of an ideal out-of-band CL solution.
翻译:过去几年中,深度强化学习已成为一种在具有时变统计特性的复杂网络中自动学习高效资源管理策略的重要解决方案。然而,5G及未来网络的复杂度提升要求相应的学习智能体更加复杂,而学习过程本身最终可能与用户竞争通信与计算资源。这造成了一种矛盾:一方面,学习过程需要资源来快速收敛至有效策略;另一方面,学习过程必须高效,即尽可能少占用用户数据平面的资源,以免影响用户的服务质量。本文研究了这一权衡问题,并提出了一种动态策略来平衡分配给数据平面和保留给学习的资源。采用所提方法,学习智能体能够快速收敛至高效的资源分配策略,并像持续学习范式一样适应环境变化,同时最小化对用户服务质量的影响。仿真结果表明,所提方法以最小的学习开销优于静态分配方法,几乎达到了理想带外持续学习方案的性能水平。