Coarse-Grain Reconfigurable Arrays (CGRAs) represent emerging low-power architectures designed to accelerate Compute-Intensive Loops (CILs). The effectiveness of CGRAs in providing acceleration relies on the quality of mapping: how efficiently the CIL is compiled onto the platform. State of the Art (SoA) compilation techniques utilize modulo scheduling to minimize the Iteration Interval (II) and use graph algorithms like Max-Clique Enumeration to address mapping challenges. Our work approaches the mapping problem through a satisfiability (SAT) formulation. We introduce the Kernel Mobility Schedule (KMS), an ad-hoc schedule used with the Data Flow Graph and CGRA architectural information to generate Boolean statements that, when satisfied, yield a valid mapping. Experimental results demonstrate SAT-MapIt outperforming SoA alternatives in almost 50\% of explored benchmarks. Additionally, we evaluated the mapping results in a synthesizable CGRA design and emphasized the run-time metrics trends, i.e. energy efficiency and latency, across different CILs and CGRA sizes. We show that a hardware-agnostic analysis performed on compiler-level metrics can optimally prune the architectural design space, while still retaining Pareto-optimal configurations. Moreover, by exploring how implementation details impact cost and performance on real hardware, we highlight the importance of holistic software-to-hardware mapping flows, as the one presented herein.
翻译:粗粒度可重构阵列(CGRA)是一种新兴的低功耗架构,旨在加速计算密集型循环(CIL)。CGRA提供加速的有效性取决于映射质量:即CIL在平台上的编译效率。现有先进(SoA)编译技术采用模调度来最小化迭代间隔(II),并利用最大团枚举等图算法解决映射挑战。本研究通过可满足性(SAT)公式化方法解决映射问题。我们提出核移动性调度(KMS),这是一种结合数据流图和CGRA架构信息生成布尔语句的专用调度方案,当这些布尔语句被满足时即可产生有效映射。实验结果表明,SAT-MapIt在近50%的测试基准中优于SoA替代方案。此外,我们在可综合CGRA设计中评估了映射结果,并强调了不同CIL和CGRA规模下的运行时指标趋势(即能效和延迟)。我们证明,通过编译器级指标进行的硬件无关分析能够最优地修剪架构设计空间,同时保留帕累托最优配置。最后,通过探索实现细节对实际硬件成本和性能的影响,我们凸显了本文提出的软硬件协同映射流程(如本文所示)的重要性。