Advanced packaging offers a new design paradigm in the post-Moore era, where many smaller chiplets could be assembled into a large system to achieve extreme scalability and cost reduction. Recently proposed chiplet-based DNN accelerators demonstrate its effectiveness but fail to explore the tradeoffs between PPA and the fabrication cost. Specifically, we should explore both the architectural design space for individual chiplets and different integration options to assemble these chiplets. More advanced (and costly) packaging technology can enhance connectivity, but may meanwhile reduce the budget on chiplets. In this paper, we propose ALEGO, an architecture-and-integration co-design approach for chiplet-based spatial accelerators. Based on a heterogeneous integration paradigm, ALEGO can optimize each chiplet design for different workloads to achieve better efficiency. The co-design is enabled by using uniform architecture and integration encoding and a systematic design space exploration flow. We develop an architecture modeling framework and an ML-based approach to optimize the design parameters. Experiments demonstrate that ALEGO achieves 24%, 16%, or 23% improvement in latency, energy, and cost, respectively compared with the best of separate architecture or integration optimization.
翻译:先进封装技术在后摩尔时代提供了一种新的设计范式,通过将多个小型芯粒组装成大型系统,实现极致的可扩展性和降低成本。近期提出的基于芯粒的深度神经网络加速器展示了其有效性,但未能充分探索PPA(功耗、性能、面积)与制造成本之间的权衡。具体而言,我们需同时探索单个芯粒的架构设计空间以及组装这些芯粒的不同集成方案。更先进(且昂贵)的封装技术可增强互联性,但可能相应减少芯粒的预算。本文提出ALEGO——一种面向芯片基空间加速器的架构与集成协同设计方法。基于异构集成范式,ALEGO可针对不同工作负载优化每个芯粒设计,从而提升效率。该协同设计通过统一的架构与集成编码方法以及系统化的设计空间探索流程实现。我们开发了架构建模框架和基于机器学习的方法来优化设计参数。实验表明,相较于分别优化架构或集成的方案,ALEGO在延迟、能耗和成本方面分别实现了24%、16%和23%的提升。