Network-on-chip (NoC) architectures provide a scalable, high-performance, and reliable interconnect for emerging manycore systems. The routing policies used in NoCs have a significant impact on overall performance. Prior efforts have proposed reinforcement learning (RL)-based adaptive routing policies to avoid congestion and minimize latency in NoCs. The output quality of RL policies depends on selecting a representative cost function and an effective update mechanism. Unfortunately, existing RL policies for NoC routing fail to represent path contention and regional congestion in the cost function. Moreover, the experience of packet flows sharing the same route is not fully incorporated into the RL update mechanism. In this paper, we present a novel regional congestion-aware RL-based NoC routing policy called Q-RASP that is capable of sharing experience from packets using the same routes. Q-RASP improves average packet latency by up to 18.3% and reduces NoC energy consumption by up to 6.7% with minimal area overheads compared to state-of-the-art RL-based NoC routing implementations.
翻译:片上网络(NoC)架构为新兴多核系统提供了可扩展、高性能且可靠的内互连接。NoC中采用的路由策略对整体性能具有显著影响。已有研究提出了基于强化学习(RL)的自适应路由策略,以缓解NoC拥塞并降低延迟。RL策略的输出质量取决于选择具有代表性的代价函数和有效的更新机制。遗憾的是,现有用于NoC路由的RL策略未能将路径竞争与区域拥塞纳入代价函数。此外,共享相同路由的数据流经验也未充分融入RL更新机制中。本文提出一种新颖的区域拥塞感知型RL路由策略——Q-RASP,该策略能够利用来自使用相同路由的数据包间的共享经验。与最先进的基于RL的NoC路由实现相比,Q-RASP将平均数据包延迟最多降低18.3%,NoC能耗最多减少6.7%,且面积开销极小。