With the continuous growth in communication network complexity and traffic volume, communication load balancing solutions are receiving increasing attention. Specifically, reinforcement learning (RL)-based methods have shown impressive performance compared with traditional rule-based methods. However, standard RL methods generally require an enormous amount of data to train, and generalize poorly to scenarios that are not encountered during training. We propose a policy reuse framework in which a policy selector chooses the most suitable pre-trained RL policy to execute based on the current traffic condition. Our method hinges on a policy bank composed of policies trained on a diverse set of traffic scenarios. When deploying to an unknown traffic scenario, we select a policy from the policy bank based on the similarity between the previous-day traffic of the current scenario and the traffic observed during training. Experiments demonstrate that this framework can outperform classical and adaptive rule-based methods by a large margin.
翻译:随着通信网络复杂度和流量规模的持续增长,通信负载均衡解决方案受到越来越多的关注。具体而言,基于强化学习的方案相比传统规则方法展现出显著性能优势。然而,标准强化学习方法通常需要大量训练数据,且对训练过程中未出现的场景泛化能力较差。我们提出了一种策略复用框架,该框架通过策略选择器根据当前流量条件执行最合适的预训练强化学习策略。该方法的核心是包含多种流量场景训练策略的策略库。当部署至未知流量场景时,我们基于当前场景前一日流量与训练时观测流量的相似性,从策略库中选取策略。实验表明,该框架能够以较大优势超越经典及自适应规则方法。