NEURA: A Unified and Retargetable Compilation Framework for Coarse-Grained Reconfigurable Architectures

Coarse-Grained Reconfigurable Architectures (CGRAs) are a promising and versatile accelerator platform, offering a balance between the performance and efficiency of specialized accelerators and the software programmability. However, their full potential is severely hindered by control flow in accelerated kernels, as the control flow (e.g., loops, branches) is fundamentally incompatible with the parallel, data-driven CGRA fabric. Prior strategies to resolve this mismatch in CGRA kernel acceleration are either inefficient, sacrificing performance for generality, or lack generality due to the difficulty of adapting them across different execution models. Thus, a general and unified solution for efficient CGRA kernel acceleration remains elusive. This paper introduces NEURA, a unified and retargetable compilation framework that systematically resolves the control-dataflow mismatch in CGRAs. NEURA's core innovation is a novel, pure dataflow intermediate representation (IR) built on a predicated type system. In this IR, control contexts are embedded as a predicate within each data, making control an intrinsic property of data. This mechanism enables NEURA to systematically flatten complex control flow into a single unified dataflow graph. This unified representation decouples kernel representation from hardware, empowering NEURA to retarget diverse CGRAs with different execution models and microarchitectural features. When targeted to a high-performance spatio-temporal CGRA, NEURA delivers a 2.20x speedup on kernel benchmarks and up to 2.71x geometric mean speedup on real-world applications over state-of-the-art (SOTA) high-performance baselines. It also provides a competitive solution against the SOTA low-power CGRA when retargeted to a spatial-only CGRA. NEURA is open-source and available at https://github.com/coredac/neura.

翻译：摘要：粗粒度可重构架构（CGRAs）是一种前景广阔且灵活的加速器平台，在专用加速器的高性能与高效率以及软件的可编程性之间取得了平衡。然而，其全部潜力受到加速内核中控制流的严重制约，因为控制流（例如循环、分支）与并行、数据驱动的CGRA架构本质不兼容。以往解决CGRA内核加速中这种不匹配问题的策略，要么效率低下（以牺牲性能换取通用性），要么因难以在不同执行模型间适配而缺乏通用性。因此，一种通用且统一的CGRA高效内核加速解决方案仍难以实现。本文介绍了NEURA，一个统一且可重定目标的编译框架，可系统地解决CGRA中控制流与数据流的不匹配问题。NEURA的核心创新在于一种基于谓词类型系统的新型纯数据流中间表示（IR）。在该IR中，控制上下文作为每个数据中的谓词被嵌入，使控制成为数据的内在属性。这一机制使NEURA能够将复杂控制流系统性地展平为单一统一的数据流图。这种统一表示将内核表示与硬件解耦，使NEURA能够为具有不同执行模型和微架构特性的多种CGRA重定目标。当面向高性能时空CGRA时，NEURA在内核基准测试中实现了2.20倍加速，在真实世界应用中相较于最先进的高性能基线实现了高达2.71倍的几何平均加速。当重定目标至仅空间CGRA时，它也提供了与最先进低功耗CGRA竞争的解决方案。NEURA为开源项目，可访问https://github.com/coredac/neura。