Designing accelerators for resource- and power-constrained applications is a daunting task. High-level Synthesis (HLS) addresses these constraints through resource sharing, an optimization at the HLS binding stage that maps multiple operations to the same functional unit. However, resource sharing is often limited to reusing instructions within a basic block. Instead of searching globally for the best control and dataflow graphs (CDFGs) to combine, it is constrained by existing instruction mappings and schedules. Coarse-grained function merging (CGFM) at the intermediate representation (IR) level can reuse control and dataflow patterns without dealing with the post-scheduling complexity of mapping operations onto functional units, wires, and registers. The merged functions produced by CGFM can be translated to RTL by HLS, yielding Coarse Grained Merged Accelerators (CGMAs). CGMAs are especially profitable across applications with similar data- and control-flow patterns. Prior work has used CGFM to generate CGMAs without regard for which CGFM algorithms best optimize area, power, and energy costs. We propose Guac, an energy-aware and SSA-based (static single assignment) CGMA generation methodology. Guac implements a novel ensemble of cost models for efficient CGMA generation. We also show that CGFM algorithms using SSA form to merge control- and dataflow graphs outperform prior non-SSA CGFM designs. We demonstrate significant area, power, and energy savings with respect to the state of the art. In particular, Guac more than doubles energy savings with respect to the closest related work while using a strong resource-sharing baseline.
翻译:为资源与功耗受限的应用设计加速器是一项艰巨任务。高层综合(HLS)通过资源共享(即HLS绑定阶段将多个操作映射至同一功能单元的优化手段)应对这些约束。然而,资源共享通常局限于复用基本块内的指令。它并非全局搜索最优的控制与数据流图(CDFG)进行组合,而是受限于已有的指令映射与调度方案。在中间表示(IR)层面进行粗粒度函数融合(CGFM),可在不处理操作映射至功能单元、连线及寄存器等后调度复杂性的前提下,复用控制与数据流模式。经CGFM生成的融合函数可由HLS翻译为寄存器传输级(RTL),从而产生粗粒度融合加速器(CGMA)。CGMA在具有相似数据流与控制流模式的应用间尤为有效。以往工作虽利用CGFM生成CGMA,但未考虑何种CGFM算法能最优地优化面积、功耗与能量开销。我们提出Guac——一种基于能量感知与SSA(静态单赋值)的CGMA生成方法。Guac实现了一种创新的成本模型集成方案,用于高效生成CGMA。同时我们证明,采用SSA形式合并控制流图与数据流图的CGFM算法优于以往非SSA的CGFM设计。相较于当前技术水平,我们在面积、功耗与能量节省方面取得了显著提升。特别地,在强资源共享基线条件下,Guac实现的能量节省较最相关研究工作提升逾一倍。