Current design-space exploration tools cannot accurately evaluate communication-intensive applications whose execution is data-dependent (e.g., graph analytics and sparse linear algebra) on scale-out manycore systems, due to either lack of scalability or lack of detail in modeling the network. This paper presents Muchisim, a novel parallel simulator designed to address the challenges in exploring the design space of distributed multi-chiplet manycore architectures for communication-intensive applications. We evaluate Muchisim at simulating systems with up to a million interconnected processing elements (PEs) while modeling data movement and communication in a cycle-accurate manner. In addition to performance, Muchisim reports the energy, area, and cost of the simulated system, and it comes with a benchmark application suite and two data visualization tools. Muchisim supports various parallelization strategies and communication primitives such as task-based parallelization and message passing, making it highly relevant for architectures with software-managed coherence and distributed memory. Via a case study, we show that Muchisim helps users explore the balance between memory and computation units and the constraints related to chiplet integration and inter-chip communication. Muchisim enables scaling up the systems in which new techniques or design parameters are evaluated, opening the gate for further research in this area.
翻译:当前的探索设计空间工具无法准确评估执行依赖于数据的通信密集型应用(例如图分析和稀疏线性代数)在可扩展众核系统上的性能,原因在于缺乏可扩展性或对网络建模的细节不足。本文提出了Muchisim,一种新型并行仿真器,旨在解决针对通信密集型应用的多芯片分布式众核架构的设计空间探索挑战。我们评估了Muchisim在模拟包含多达一百万个互连处理单元(PE)的系统时的性能,同时以周期精确的方式建模数据移动和通信。除性能外,Muchisim还报告仿真系统的能耗、面积和成本,并附带基准测试应用套件和两个数据可视化工具。Muchisim支持多种并行化策略和通信原语,例如基于任务的并行化和消息传递,使其在具有软件管理一致性和分布式内存的架构中高度相关。通过案例研究,我们展示了Muchisim帮助用户探索内存与计算单元之间的平衡,以及与芯片集成和芯片间通信相关的约束。Muchisim能够扩展系统规模以评估新技术或设计参数,为这一领域的进一步研究打开了大门。