Current trends in networking propose the use of Machine Learning (ML) for a wide variety of network optimization tasks. As such, many efforts have been made to produce ML-based solutions for Traffic Engineering (TE), which is a fundamental problem in ISP networks. Nowadays, state-of-the-art TE optimizers rely on traditional optimization techniques, such as Local search, Constraint Programming, or Linear programming. In this paper, we present MAGNNETO, a distributed ML-based framework that leverages Multi-Agent Reinforcement Learning and Graph Neural Networks for distributed TE optimization. MAGNNETO deploys a set of agents across the network that learn and communicate in a distributed fashion via message exchanges between neighboring agents. Particularly, we apply this framework to optimize link weights in OSPF, with the goal of minimizing network congestion. In our evaluation, we compare MAGNNETO against several state-of-the-art TE optimizers in more than 75 topologies (up to 153 nodes and 354 links), including realistic traffic loads. Our experimental results show that, thanks to its distributed nature, MAGNNETO achieves comparable performance to state-of-the-art TE optimizers with significantly lower execution times. Moreover, our ML-based solution demonstrates a strong generalization capability to successfully operate in new networks unseen during training.
翻译:当前网络趋势提出将机器学习(ML)广泛应用于各类网络优化任务。为此,业界已投入大量努力研究基于ML的流量工程(TE)解决方案,TE是ISP网络中的基础性问题。目前,最先进的TE优化器仍依赖传统优化技术,如局部搜索、约束规划或线性规划。本文提出MAGNNETO——一种基于分布式ML的框架,利用多智能体强化学习和图神经网络实现分布式TE优化。MAGNNETO在网络中部署一组智能体,通过相邻智能体间的消息交换以分布式方式进行学习与通信。具体而言,我们将该框架应用于优化OSPF链路权重,目标是最小化网络拥塞。我们在超过75种拓扑(最多153个节点和354条链路)上,包含真实流量负载,将MAGNNETO与多个最先进的TE优化器进行比较。实验结果表明,得益于其分布式特性,MAGNNETO在显著降低执行时间的同时,实现了与最先进TE优化器相当的性能。此外,我们的ML解决方案展现出强大的泛化能力,能够成功部署于训练中未见过的全新网络。