Transformer-based large language models are increasingly constrained by data movement as communication bandwidth drops sharply beyond the chip boundary. Wafer-scale integration using wafer-on-wafer hybrid bonding alleviates this limitation by providing ultra-high bandwidth between reticles on bonded wafers. In this paper, we investigate how the physical placement of reticles on wafers influences the achievable network topology and the resulting communication performance. Starting from a 2D mesh-like baseline, we propose four reticle placements (Aligned, Interleaved, Rotated, and Contoured) that improve throughput by up to 250%, reduce latency by up to 36%, and decrease energy per transmitted byte by up to 38%.
翻译:基于Transformer的大语言模型日益受到数据移动的限制,因为通信带宽在芯片边界之外急剧下降。采用晶圆间混合键合的晶圆级集成技术通过在键合晶圆上的光罩之间提供超高带宽,缓解了这一限制。本文研究了晶圆上光罩的物理布局如何影响可实现的网络拓扑结构以及由此产生的通信性能。从一个二维网状基线出发,我们提出了四种光罩布局方案(对齐式、交错式、旋转式和轮廓式),这些方案可将吞吐量提升高达250%,将延迟降低高达36%,并将每传输字节的能耗降低高达38%。