on adaptive routing to balance network traffic for optimum performance. Ideally, adaptive routing attempts to forward packets between minimal and non-minimal paths with the least congestion. In practice, current adaptive routing algorithms estimate routing path congestion based on local information such as output queue occupancy. Using local information to estimate global path congestion is inevitably inaccurate because a router has no precise knowledge of link states a few hops away. This inaccuracy could lead to interconnect congestion. In this study, we present Q-adaptive routing, a multi-agent reinforcement learning routing scheme for Dragonfly systems. Q-adaptive routing enables routers to learn to route autonomously by leveraging advanced reinforcement learning technology. The proposed Q-adaptive routing is highly scalable thanks to its fully distributed nature without using any shared information between routers. Furthermore, a new two-level Q-table is designed for Q-adaptive to make it computational lightly and saves 50% of router memory usage compared with the previous Q-routing. We implement the proposed Q-adaptive routing in SST/Merlin simulator. Our evaluation results show that Q-adaptive routing achieves up to 10.5% system throughput improvement and 5.2x average packet latency reduction compared with adaptive routing algorithms. Remarkably, Q-adaptive can even outperform the optimal VALn non-minimal routing under the ADV+1 adversarial traffic pattern with up to 3% system throughput improvement and 75% average packet latency reduction.
翻译:自适应路由通过平衡网络流量以实现最优性能。理想情况下,自适应路由尝试在拥塞最小的最短路径与非最短路径之间转发数据包。然而在实际中,当前自适应路由算法基于局部信息(如输出队列占用率)估算路由路径拥塞程度。由于路由器无法精确获取数跳之外链路状态信息,利用局部信息估算全局路径拥塞必然存在偏差,这种偏差可能导致互连网络拥塞。本研究提出Q-adaptive路由——一种适用于蜻蜓系统的多智能体强化学习路由方案。Q-adaptive路由借助先进强化学习技术,使路由器能够自主习得路由决策。该方案因完全分布式特性(无需路由器间共享任何信息)而具备高度可扩展性。此外,我们为Q-adaptive设计了一种新型两级Q表,既降低计算开销,又相比传统Q路由节省50%的路由器内存。我们在SST/Merlin仿真器中实现了所提出的Q-adaptive路由。评估结果表明,与自适应路由算法相比,Q-adaptive路由可实现高达10.5%的系统吞吐量提升,并将平均数据包延迟降低5.2倍。值得关注的是,在ADV+1对抗性流量模式下,Q-adaptive甚至能超越最优VALn非最短路径路由:系统吞吐量提升3%,平均数据包延迟降低75%。