The design space exploration of scaled-out manycores for communication-intensive applications (e.g., graph analytics and sparse linear algebra) is hampered due to either lack of scalability or accuracy of existing frameworks at simulating data-dependent execution patterns. This paper presents MuchiSim, a novel parallel simulator designed to address these challenges when exploring the design space of distributed multi-chiplet manycore architectures. We evaluate MuchiSim at simulating systems with up to a million interconnected processing units (PUs) while modeling data movement and communication cycle by cycle. In addition to performance, MuchiSim reports the energy, area, and cost of the simulated system. It also comes with a benchmark application suite and two data visualization tools. MuchiSim supports various parallelization strategies and communication primitives such as task-based parallelization and message passing, making it highly relevant for architectures with software-managed coherence and distributed memory. Via a case study, we show that MuchiSim helps users explore the balance between memory and computation units and the constraints related to chiplet integration and inter-chip communication. MuchiSim enables evaluating new techniques or design parameters for systems at scales that are more realistic for modern parallel systems, opening the gate for further research in this area.
翻译:面向通信密集型应用(如图分析和稀疏线性代数)的规模化众核系统设计空间探索受到现有框架在模拟数据相关执行模式时可扩展性或准确性的制约。本文提出新型并行仿真器MuchiSim,旨在解决分布式多芯粒众核架构设计空间探索中的上述挑战。我们评估了MuchiSim在模拟包含多达百万互联处理单元(PU)系统时的表现,并逐周期建模数据移动与通信。除性能外,MuchiSim还报告模拟系统的能耗、面积和成本,并提供基准测试套件与两个数据可视化工具。MuchiSim支持任务级并行化、消息传递等多种并行策略与通信原语,使其特别适用于软件管理一致性与分布式内存架构。通过案例研究,我们展示了MuchiSim如何帮助用户探索内存与计算单元的平衡,以及芯粒集成和芯片间通信的相关约束。MuchiSim使得在更贴近现代并行系统实际规模的条件下评估新技术或设计参数成为可能,为该领域的进一步研究开辟了道路。