Dynamically scheduled high-level synthesis (HLS) achieves higher throughput than static HLS for codes with unpredictable memory accesses and control flow. However, excessive dataflow scheduling results in circuits that use more resources and have a slower critical path, even when only a part of the circuit exhibits dynamic behavior. Recent work has shown that marking parts of a dataflow circuit for static scheduling can save resources and improve performance (hybrid scheduling), but the dynamic part of the circuit still bottlenecks the critical path. We propose instead to selectively introduce dynamic scheduling into static HLS. This paper presents an algorithm for identifying code regions amenable to dynamic scheduling and shows a methodology for introducing dynamically scheduled basic blocks, loops, and memory operations into static HLS. Our algorithm is informed by modulo-scheduling and can be integrated into any modulo-scheduled HLS tool. On a set of ten benchmarks, we show that our approach achieves on average an up to 3.7$\times$ and 3$\times$ speedup against dynamic and hybrid scheduling, respectively, with an area overhead of 1.3$\times$ and frequency degradation of 0.74$\times$ when compared to static HLS.
翻译:动态调度的高层次综合(HLS)在处理具有不可预测内存访问和控制流的代码时,相比静态HLS能实现更高的吞吐量。然而,即使只有部分电路表现出动态行为,过度的数据流调度也会导致电路消耗更多资源并具有更慢的关键路径。最近的研究表明,将数据流电路的部分区域标记为静态调度可以节省资源并提升性能(混合调度),但电路的动态部分仍会制约关键路径。我们提出了一种替代方案:选择性地将动态调度引入静态HLS中。本文提出了一种识别适合动态调度的代码区域的算法,并展示了一种将动态调度的基本块、循环和内存操作引入静态HLS的方法。我们的算法基于模调度(modulo-scheduling)设计,并可集成至任何基于模调度的HLS工具中。在十个基准测试上,我们证明该方法相比动态调度和混合调度分别实现了平均最高3.7倍和3倍的加速比,同时相较于静态HLS,面积开销为1.3倍,频率退化至0.74倍。