Hardware development critically depends on cycle-accurate RTL simulation. However, as chip complexity increases, conventional single-threaded simulation becomes impractical due to stagnant single-core performance. Parendi is an RTL simulator that addresses this challenge by exploiting the abundant fine-grained parallelism inherent in RTL simulation and efficiently mapping it onto the massively parallel Graphcore IPU (Intelligence Processing Unit) architecture. Parendi scales up to 5888 cores on 4 Graphcore IPU sockets. It allows us to run large RTL designs up to 4$\times$ faster than the most powerful state-of-the-art x64 multicore systems. To achieve this performance, we developed new partitioning and compilation techniques and carefully quantified the synchronization, communication, and computation costs of parallel RTL simulation: The paper comprehensively analyzes these factors and details the strategies that Parendi uses to optimize them.
翻译:硬件开发高度依赖于周期精确的RTL仿真。然而,随着芯片复杂度提升,由于单核性能停滞不前,传统的单线程仿真已变得不切实际。Parendi是一款RTL仿真器,它通过挖掘RTL仿真中固有的丰富细粒度并行性,并将其高效映射至大规模并行的Graphcore IPU(智能处理单元)架构,从而应对这一挑战。Parendi可在4个Graphcore IPU插槽上扩展至5888个核心。它使我们能够以比最先进的x64多核系统快4$\times$的速度运行大型RTL设计。为实现此性能,我们开发了新的分区与编译技术,并细致量化了并行RTL仿真的同步、通信与计算开销:本文全面分析了这些因素,并详述了Parendi用于优化这些因素的策略。