Non-differentiable controllers and rule-based policies are widely used for controlling real systems such as telecommunication networks and robots. Specifically, parameters of mobile network base station antennas can be dynamically configured by these policies to improve users coverage and quality of service. Motivated by the antenna tilt control problem, we introduce Model-Based Residual Policy Learning (MBRPL), a practical reinforcement learning (RL) method. MBRPL enhances existing policies through a model-based approach, leading to improved sample efficiency and a decreased number of interactions with the actual environment when compared to off-the-shelf RL methods.To the best of our knowledge, this is the first paper that examines a model-based approach for antenna control. Experimental results reveal that our method delivers strong initial performance while improving sample efficiency over previous RL methods, which is one step towards deploying these algorithms in real networks.
翻译:非可微控制器和基于规则的策略被广泛应用于控制真实系统,例如电信网络和机器人。具体而言,移动网络基站天线的参数可通过这些策略进行动态配置,以提升用户覆盖范围和服务质量。受天线倾斜角控制问题的启发,我们提出了一种实用的强化学习方法——基于模型的残差策略学习(MBRPL)。该方法通过基于模型的途径增强现有策略,相较于现成的强化学习方法,显著提升了样本效率并减少了与真实环境的交互次数。据我们所知,这是首篇研究基于模型方法用于天线控制的论文。实验结果表明,我们的方法在保持初始阶段强劲性能的同时,相比以往的强化学习方法提升了样本效率,这向着在实际网络中部署此类算法迈出了一步。