Transformer-based large language models are increasingly constrained by data movement as communication bandwidth drops sharply beyond the chip boundary. Wafer-scale integration using wafer-on-wafer hybrid bonding alleviates this limitation by providing ultra-high bandwidth between reticles on bonded wafers. In this paper, we investigate how the physical placement of reticles on wafers influences the achievable network topology and the resulting communication performance. Starting from a 2D mesh-like baseline, we propose four reticle placements (Aligned, Interleaved, Rotated, and Contoured) that improve throughput by up to 250%, reduce latency by up to 36%, and decrease energy per transmitted byte by up to 38%.
翻译:基于Transformer的大语言模型日益受到数据传输瓶颈的制约,因为通信带宽在芯片边界外急剧下降。采用晶圆间混合键合的晶圆级集成技术,通过在键合晶圆上的掩模版之间提供超高带宽来缓解这一限制。在本文中,我们研究了掩模版在晶圆上的物理布局如何影响可达网络拓扑结构以及由此产生的通信性能。以2D类网格基线为起点,我们提出了四种掩模版布局方案(对齐式、交错式、旋转式和轮廓式),这些方案可将吞吐量提升高达250%,延迟降低高达36%,并使每传输字节的能耗减少高达38%。