Global placement is a fundamental step in VLSI physical design. The wide use of 2D processing element (PE) arrays in machine learning accelerators poses new challenges of scalability and Quality of Results (QoR) for state-of-the-art academic global placers. In this work, we develop DG-RePlAce, a new and fast GPU-accelerated global placement framework built on top of the OpenROAD infrastructure, which exploits the inherent dataflow and datapath structures of machine learning accelerators. Experimental results with a variety of machine learning accelerators using a commercial 12nm enablement show that, compared with RePlAce (DREAMPlace), our approach achieves an average reduction in routed wirelength by 10% (7%) and total negative slack (TNS) by 31% (34%), with faster global placement and on-par total runtimes relative to DREAMPlace. Empirical studies on the TILOS MacroPlacement Benchmarks further demonstrate that post-route improvements over RePlAce and DREAMPlace may reach beyond the motivating application to machine learning accelerators.
翻译:全局布局是超大规模集成电路物理设计中的关键步骤。机器学习加速器中二维处理单元阵列的广泛应用,对当前主流学术全局布局工具的可扩展性和结果质量提出了新的挑战。本文提出DG-RePlAce,一个基于OpenROAD基础设施构建的新型快速GPU加速全局布局框架,该框架充分利用了机器学习加速器固有的数据流与数据通路结构。在商用12纳米工艺节点上对多种机器学习加速器进行的实验结果表明,相较于RePlAce(DREAMPlace),我们的方法平均减少布线长度10%(7%),总负时序裕量31%(34%),在实现更快的全局布局速度的同时,总运行时间与DREAMPlace相当。基于TILOS宏单元布局基准的实证研究进一步表明,相较于RePlAce和DREAMPlace,布线后的改进效果可能超越机器学习加速器这一初始应用场景。