High-Level Synthesis allows hardware designers to create complex RTL designs using C/C++. The traditional HLS workflow involves iterations of C/C++ simulation for partial functional verification and HLS synthesis for coarse timing estimates. However, neither C/C++ simulation nor HLS synthesis estimates can account for complex behaviors like FIFO interactions and pipeline stalls, thereby obscuring problems like deadlocks and latency overheads. Such problems are revealed only through C/RTL co-simulation, which is typically orders of magnitude slower than either C/C++ simulation or HLS synthesis, far too slow to integrate into the edit-run development cycle. Addressing this, we propose LightningSim, a fast simulation tool for HLS that combines the speed of native C/C++ with the accuracy of C/RTL co-simulation. LightningSim directly operates on the LLVM intermediate representation (IR) code and accurately simulates a hardware design's dynamic behavior. First, it traces LLVM IR execution to capture the run-time information; second, it maps the static HLS scheduling information to the trace to simulate the dynamic behavior; third, it calculates stalls and deadlocks from inter-function interactions to get precise cycle counts. Evaluated on 33 benchmarks, LightningSim produces 99.9%-accurate timing estimates up to 95x faster than RTL simulation. Our code is publicly available on GitHub.
翻译:高层次综合使硬件设计者能够使用C/C++创建复杂的RTL设计。传统HLS工作流包含用于部分功能验证的C/C++仿真迭代和用于粗粒度时序估计的HLS综合迭代。然而,C/C++仿真与HLS综合估计均无法反映FIFO交互和流水线停顿等复杂行为,从而掩盖死锁和延迟开销等问题。这些问题仅能通过C/RTL协同仿真揭示,但该仿真通常比C/C++仿真或HLS综合慢数个数量级,无法集成到编辑-运行开发周期中。针对这一挑战,我们提出LightningSim——一种兼具原生C/C++速度与C/RTL协同仿真精度的HLS快速仿真工具。LightningSim直接作用于LLVM中间表示代码,精确模拟硬件设计的动态行为:首先,通过追踪LLVM IR执行捕获运行时信息;其次,将静态HLS调度信息映射至迹线以模拟动态行为;最后,根据函数间交互计算停顿与死锁,获得精确周期计数。在33个基准测试上的评估表明,LightningSim能产生99.9%精度的时序估计,速度比RTL仿真快95倍。我们的代码已在GitHub公开。