Communication load balancing aims to balance the load between different available resources, and thus improve the quality of service for network systems. After formulating the load balancing (LB) as a Markov decision process problem, reinforcement learning (RL) has recently proven effective in addressing the LB problem. To leverage the benefits of classical RL for load balancing, however, we need an explicit reward definition. Engineering this reward function is challenging, because it involves the need for expert knowledge and there lacks a general consensus on the form of an optimal reward function. In this work, we tackle the communication load balancing problem from an inverse reinforcement learning (IRL) approach. To the best of our knowledge, this is the first time IRL has been successfully applied in the field of communication load balancing. Specifically, first, we infer a reward function from a set of demonstrations, and then learn a reinforcement learning load balancing policy with the inferred reward function. Compared to classical RL-based solution, the proposed solution can be more general and more suitable for real-world scenarios. Experimental evaluations implemented on different simulated traffic scenarios have shown our method to be effective and better than other baselines by a considerable margin.
翻译:通信负载均衡旨在平衡不同可用资源间的负载,从而提升网络系统的服务质量。将负载均衡(LB)建模为马尔可夫决策过程问题后,强化学习(RL)已被证明能有效解决LB问题。然而,为利用经典RL实现负载均衡的优势,我们需要明确的奖励函数定义。设计该奖励函数极具挑战性,因为其涉及专家知识需求且缺乏对最优奖励函数形式的公认标准。本研究从逆强化学习(IRL)角度解决通信负载均衡问题。据我们所知,这是IRL首次成功应用于通信负载均衡领域。具体而言,我们首先从示范数据中推断出奖励函数,随后基于推断的奖励函数学习强化学习负载均衡策略。与经典RL方法相比,所提方案更具通用性且更适用于实际场景。在不同模拟流量场景下的实验评估表明,该方法有效且性能显著优于其他基线方法。