Rapidly evolving artificial intelligence and machine learning applications require ever-increasing computational capabilities, while monolithic 2D design technologies approach their limits. Heterogeneous integration of smaller chiplets using a 2.5D silicon interposer and 3D packaging has emerged as a promising paradigm to address this limit and meet performance demands. These approaches offer a significant cost reduction and higher manufacturing yield than monolithic 2D integrated circuits. However, the compact arrangement and high compute density exacerbate the thermal management challenges, potentially compromising performance. Addressing these thermal modeling challenges is critical, especially as system sizes grow and different design stages require varying levels of accuracy and speed. Since no single thermal modeling technique meets all these needs, this paper introduces MFIT, a range of multi-fidelity thermal models that effectively balance accuracy and speed. These multi-fidelity models can enable efficient design space exploration and runtime thermal management. Our extensive testing on systems with 16, 36, and 64 2.5D integrated chiplets and 16x3 3D integrated chiplets demonstrates that these models can reduce execution times from days to mere seconds and milliseconds with negligible loss in accuracy.
翻译:快速演进的人工智能与机器学习应用对计算能力的需求日益增长,而单片2D设计技术正逼近其极限。采用2.5D硅中介层和3D封装技术集成更小芯粒的异构集成方案,已成为应对此极限并满足性能需求的一种前景广阔的范式。相较于单片2D集成电路,这些方法能显著降低成本并提高制造良率。然而,紧凑的布局和高计算密度加剧了热管理挑战,可能影响系统性能。解决这些热建模问题至关重要,尤其是在系统规模不断扩大且不同设计阶段需要不同精度与速度的情况下。由于单一的热建模技术无法满足所有这些需求,本文提出了MFIT——一系列能有效平衡精度与速度的多保真度热模型。这些多保真度模型能够支持高效的设计空间探索和运行时热管理。我们在包含16、36和64个2.5D集成芯粒以及16x3个3D集成芯粒的系统上进行了广泛测试,结果表明这些模型可将执行时间从数天缩短至数秒甚至毫秒级,且精度损失可忽略不计。