2.5D integration technology is gaining traction as it copes with the exponentially growing design cost of modern integrated circuits. A crucial part of a 2.5D stacked chip is a low-latency and high-throughput inter-chiplet interconnect (ICI). Two major factors affecting the latency and throughput are the topology of links between chiplets and the chiplet placement. In this work, we present PlaceIT, a novel methodology to jointly optimize the ICI topology and the chiplet placement. While state-of-the-art methods optimize the chiplet placement for a predetermined ICI topology, or they select one topology out of a set of candidates, we generate a completely new topology for each placement. Our process of inferring placement-based ICI topologies connects chiplets that are in close proximity to each other, making it particularly attractive for chips with silicon bridges or passive silicon interposers with severely limited link lengths. We provide an open-source implementation of our method that optimizes the placement of homogeneously or heterogeneously shaped chiplets and the ICI topology connecting them for a user-defined mix of four different traffic types. We evaluate our methodology using synthetic traffic and traces, and we compare our results to a 2D mesh baseline. PlaceIT reduces the latency of synthetic L1-to-L2 and L2-to-memory traffic, the two most important types for cache coherency traffic, by up to 28% and 62%, respectively. It also achieve an average packet latency reduction of up to 18% on traffic traces. PlaceIT enables the construction of 2.5D stacked chips with low-latency ICIs.
翻译:2.5D集成技术因其能够应对现代集成电路设计成本指数级增长而日益受到关注。2.5D堆叠芯片的关键组成部分是低延迟、高吞吐的芯粒间互连(ICI)。影响延迟和吞吐量的两个主要因素是芯粒间的链路拓扑结构与芯粒布局。本研究提出PlaceIT,一种联合优化ICI拓扑与芯粒布局的新方法。现有先进方法通常针对预设的ICI拓扑优化芯粒布局,或从候选拓扑集合中选择一种,而本方法为每种布局生成全新的拓扑结构。我们推导基于布局的ICI拓扑的过程会连接彼此邻近的芯粒,这尤其适用于采用硅桥或链路长度严格受限的无源硅中介层的芯片。我们提供了该方法的开源实现,可针对用户定义的四种流量类型混合需求,优化同构或异构形状芯粒的布局及其连接ICI拓扑。我们使用合成流量与真实流量轨迹进行评估,并将结果与二维网格基线进行对比。对于缓存一致性流量中最重要的两种类型——合成L1至L2流量与L2至内存流量,PlaceIT分别实现了最高28%和62%的延迟降低。在真实流量轨迹上,其平均数据包延迟降低幅度最高可达18%。PlaceIT为构建具有低延迟ICI的2.5D堆叠芯片提供了有效途径。