The demise of Moore's Law and Dennard Scaling has revived interest in specialized computer architectures and accelerators. Verification and testing of this hardware depend heavily upon cycle-accurate simulation of register-transfer-level (RTL) designs. The fastest software RTL simulators can simulate designs at 1--1000 kHz, i.e., more than three orders of magnitude slower than hardware. Improved simulators can increase designers' productivity by speeding design iterations and permitting more exhaustive exploration. One possibility is to exploit low-level parallelism, as RTL expresses considerable fine-grain concurrency. Unfortunately, state-of-the-art RTL simulators often perform best on a single core since modern processors cannot effectively exploit fine-grain parallelism. This work presents Manticore: a parallel computer designed to accelerate RTL simulation. Manticore uses a static bulk-synchronous parallel (BSP) execution model to eliminate fine-grain synchronization overhead. It relies entirely on a compiler to schedule resources and communication, which is feasible since RTL code contains few divergent execution paths. With static scheduling, communication and synchronization no longer incur runtime overhead, making fine-grain parallelism practical. Moreover, static scheduling dramatically simplifies processor implementation, significantly increasing the number of cores that fit on a chip. Our 225-core FPGA implementation running at 475 MHz outperforms a state-of-the-art RTL simulator running on desktop and server computers in 8 out of 9 benchmarks.
翻译:摩尔定律和登纳德缩放定律的终结重新激发了人们对专用计算机架构和加速器的兴趣。这些硬件的验证与测试高度依赖寄存器传输级(RTL)设计的周期精确仿真。最快的软件RTL仿真器能以1–1000 kHz的速度仿真设计,即比硬件慢三个数量级以上。改进仿真器可通过加速设计迭代和允许更彻底的探索来提高设计师的生产力。一种可能性是利用低层并行性,因为RTL蕴含了大量细粒度并发性。然而,最先进的RTL仿真器通常在单核上表现最佳,因为现代处理器无法有效利用细粒度并行性。本文提出Manticore:一种专为加速RTL仿真而设计的并行计算机。Manticore采用静态批量同步并行(BSP)执行模型以消除细粒度同步开销。它完全依赖编译器来调度资源和通信,由于RTL代码包含的分支执行路径极少,这一做法是可行的。通过静态调度,通信和同步不再产生运行时开销,使得细粒度并行性成为现实。此外,静态调度极大简化了处理器实现,显著增加了单芯片上可集成的核心数量。我们基于225核FPGA、运行频率为475 MHz的实现,在9个基准测试中的8个上优于运行在台式机和服务器计算机上的最先进RTL仿真器。