Advanced packaging offers a new design paradigm in the post-Moore era, where many small chiplets can be assembled into a large system. Based on heterogeneous integration, a chiplet-based accelerator can be highly specialized for a specific workload, demonstrating extreme efficiency and cost reduction. To fully leverage this potential, it is critical to explore both the architectural design space for individual chiplets and different integration options to assemble these chiplets, which have yet to be fully exploited by existing proposals. This paper proposes Monad, a cost-aware specialization approach for chiplet-based spatial accelerators that explores the tradeoffs between PPA and fabrication costs. To evaluate a specialized system, we introduce a modeling framework considering the non-uniformity in dataflow, pipelining, and communications when executing multiple tensor workloads on different chiplets. We propose to combine the architecture and integration design space by uniformly encoding the design aspects for both spaces and exploring them with a systematic ML-based approach. The experiments demonstrate that Monad can achieve an average of 16% and 30% EDP reduction compared with the state-of-the-art chiplet-based accelerators, Simba and NN-Baton, respectively.
翻译:先进封装技术在后摩尔时代提供了一种新的设计范式,可将众多小芯粒组装为大型系统。基于异构集成,芯粒化加速器可针对特定工作负载实现高度特化,展现出极致的能效与成本优势。为充分释放这一潜力,亟需探索单个芯粒的架构设计空间以及组装芯粒的不同集成方案,而现有方案尚未充分开发这些空间。本文提出Monad——一种面向芯粒化空间加速器的成本感知特化方法,用于权衡PPA与制造成本。为评估特化系统,我们引入了一个建模框架,该框架考虑了不同芯粒执行多个张量工作负载时数据流、流水线及通信的非均匀性。我们提出通过统一编码两个设计空间的架构与集成要素,并采用系统化的机器学习方法进行探索,从而将架构与集成设计空间相结合。实验表明,与最先进的芯粒化加速器Simba和NN-Baton相比,Monad平均可分别降低16%和30%的EDP。