Coarse-grained reconfigurable arrays (CGRAs) are domain-specific devices promising both the flexibility of FPGAs and the performance of ASICs. However, with restricted domains comes a danger: designing chips that cannot accelerate enough current and future software to justify the hardware cost. We introduce FlexC, the first flexible CGRA compiler, which allows CGRAs to be adapted to operations they do not natively support. FlexC uses dataflow rewriting, replacing unsupported regions of code with equivalent operations that are supported by the CGRA. We use equality saturation, a technique enabling efficient exploration of a large space of rewrite rules, to effectively search through the program-space for supported programs. We applied FlexC to over 2,000 loop kernels, compiling to four different research CGRAs and 300 generated CGRAs and demonstrate a 2.2$\times$ increase in the number of loop kernels accelerated leading to 3$\times$ speedup compared to an Arm A5 CPU on kernels that would otherwise be unsupported by the accelerator.
翻译:粗粒度可重构阵列(CGRA)是一种领域专用器件,兼具FPGA的灵活性和ASIC的性能。然而,受限的应用领域也带来了风险:设计出的芯片可能无法充分加速当前及未来的软件,从而难以证明硬件成本的合理性。我们提出了FlexC,这是首个灵活的CGRA编译器,能够使CGRA适应其原生不支持的操作。FlexC采用数据流重写技术,将不支持的代码区域替换为CGRA支持的等价操作。我们利用等式饱和这一高效探索大规模重写规则空间的技术,在程序空间中有效搜索受支持的程序。我们将FlexC应用于超过2,000个循环内核,编译至四种不同的研究型CGRA和300个生成型CGRA,结果表明可加速的循环内核数量增加了2.2倍,对于原本无法由加速器支持的内核,相比Arm A5 CPU实现了3倍的加速比。