The demise of Moore's Law and Dennard Scaling has revived interest in specialized computer architectures and accelerators. Verification and testing of this hardware depend heavily upon cycle-accurate simulation of register-transfer-level (RTL) designs. The fastest software RTL simulators can simulate designs at 1--1000 kHz, i.e., more than three orders of magnitude slower than hardware. Improved simulators can increase designers' productivity by speeding design iterations and permitting more exhaustive exploration. One possibility is to exploit low-level parallelism, as RTL expresses considerable fine-grain concurrency. Unfortunately, state-of-the-art RTL simulators often perform best on a single core since modern processors cannot effectively exploit fine-grain parallelism. This work presents Manticore: a parallel computer designed to accelerate RTL simulation. Manticore uses a static bulk-synchronous parallel (BSP) execution model to eliminate fine-grain synchronization overhead. It relies entirely on a compiler to schedule resources and communication, which is feasible since RTL code contains few divergent execution paths. With static scheduling, communication and synchronization no longer incur runtime overhead, making fine-grain parallelism practical. Moreover, static scheduling dramatically simplifies processor implementation, significantly increasing the number of cores that fit on a chip. Our 225-core FPGA implementation running at 475 MHz outperforms a state-of-the-art RTL simulator running on desktop and server computers in 8 out of 9 benchmarks.
翻译:摩尔定律和登纳德缩放效应的终结重新激发了人们对专用计算机架构和加速器的兴趣。这类硬件的验证与测试高度依赖寄存器传输级(RTL)设计的周期精确仿真。最快的软件RTL仿真器能够以1-1000千赫兹的速度仿真设计,即比硬件慢三个数量级以上。改进的仿真器可通过加速设计迭代和允许更详尽的探索来提高设计人员的生产力。一种可能性是利用底层并行性,因为RTL表述了相当多的细粒度并发性。然而,最先进的RTL仿真器通常在单核上表现最佳,因为现代处理器无法有效利用细粒度并行性。本文提出了Manticore:一种专为加速RTL仿真而设计的并行计算机。Manticore采用静态批量同步并行(BSP)执行模型来消除细粒度同步开销。它完全依赖编译器来调度资源和通信,由于RTL代码包含很少的分叉执行路径,因此这是可行的。通过静态调度,通信和同步不再产生运行时开销,从而使细粒度并行变得实用。此外,静态调度极大简化了处理器实现,显著增加了芯片上可集成的核心数量。我们在475兆赫兹频率下运行的225核FPGA实现,在9个基准测试中有8个超越了在台式机和服务器计算机上运行的最先进RTL仿真器。