Deep Reinforcement Learning (DRL) algorithms have recently made significant strides in improving network performance. Nonetheless, their practical use is still limited in the absence of safe exploration and safe decision-making. In the context of commercial solutions, reliable and safe-to-operate systems are of paramount importance. Taking this problem into account, we propose a safe learning-based load balancing algorithm for Software Defined-Wide Area Network (SD-WAN), which is empowered by Deep Reinforcement Learning (DRL) combined with a Control Barrier Function (CBF). It safely projects unsafe actions into feasible ones during both training and testing, and it guides learning towards safe policies. We successfully implemented the solution on GPU to accelerate training by approximately 110x times and achieve model updates for on-policy methods within a few seconds, making the solution practical. We show that our approach delivers near-optimal Quality-of-Service (QoS performance in terms of end-to-end delay while respecting safety requirements related to link capacity constraints. We also demonstrated that on-policy learning based on Proximal Policy Optimization (PPO) performs better than off-policy learning with Deep Deterministic Policy Gradient (DDPG) when both are combined with a CBF for safe load balancing.
翻译:深度强化学习(DRL)算法近年来在提升网络性能方面取得了显著进展。然而,由于缺乏安全探索和决策能力,其实际应用仍受到限制。商业解决方案中,系统的可靠性与安全运行至关重要。针对这一问题,我们提出一种基于安全学习的软件定义广域网(SD-WAN)负载均衡算法,该算法融合深度强化学习与控制屏障函数(CBF),在训练和测试阶段将不安全动作安全地映射为可行动作,并引导学习过程趋向安全策略。我们成功地在GPU上实现该解决方案,训练速度提升约110倍,使得在线策略方法模型更新可在数秒内完成,从而确保了算法的实用性。实验表明,本方法在满足链路容量约束等安全要求的前提下,实现了接近理论最优的端到端时延服务质量(QoS)。同时证明,当与CBF结合实现安全负载均衡时,基于近端策略优化(PPO)的在线策略学习方法优于基于深度确定性策略梯度(DDPG)的离线策略学习方法。