Existing high-performance computing (HPC) interconnection architectures are based on high-radix switches, which limits the injection/local performance and introduces latency/energy/cost overhead. The new wafer-scale packaging and high-speed wireline technologies provide high-density, low-latency, and high-bandwidth connectivity, thus promising to support direct-connected high-radix interconnection architecture. In this paper, we propose a wafer-based interconnection architecture called Switch-Less-Dragonfly-on-Wafers. By utilizing distributed high-bandwidth networks-on-chip-on-wafer, costly high-radix switches of the Dragonfly topology are eliminated while increasing the injection/local throughput and maintaining the global throughput. Based on the proposed architecture, we also introduce baseline and improved deadlock-free minimal/non-minimal routing algorithms with only one additional virtual channel. Extensive evaluations show that the Switch-Less-Dragonfly-on-Wafers outperforms the traditional switch-based Dragonfly in both cost and performance. Similar approaches can be applied to other switch-based direct topologies, thus promising to power future large-scale supercomputers.
翻译:现有高性能计算互连架构基于高基数交换机,这限制了注入/本地性能并引入了延迟/能耗/成本开销。新兴的晶圆级封装与高速有线技术提供了高密度、低延迟、高带宽的连接能力,从而有望支持直接连接的高基数互连架构。本文提出一种基于晶圆的互连架构——Switch-Less-Dragonfly-on-Wafers。通过利用分布式高带宽晶圆上片上网络,该架构在提升注入/本地吞吐量并保持全局吞吐量的同时,消除了蜻蜓拓扑中昂贵的高基数交换机。基于所提架构,我们进一步设计了基线版及改进版的无死锁最短/非最短路由算法,该算法仅需增加一条虚拟通道。大量评估结果表明,Switch-Less-Dragonfly-on-Wafers在成本与性能上均优于传统的基于交换机的蜻蜓拓扑。类似方法可推广至其他基于交换机的直接拓扑结构,从而有望为未来大规模超级计算机提供动力。