Reconfigurable computing offers a good balance between flexibility and energy efficiency. When combined with software-programmable devices such as CPUs, it is possible to obtain higher performance by spatially distributing the parallelizable sections of an application throughout the reconfigurable device while the CPU is in charge of control-intensive sections. This work introduces an elastic Coarse-Grained Reconfigurable Architecture (CGRA) integrated into an energy-efficient RISC-V-based SoC designed for the embedded domain. The microarchitecture of CGRA supports conditionals and irregular loops, making it adaptable to domain-specific applications. Additionally, we propose specific mapping strategies that enable the efficient utilization of the CGRA for both simple applications, where the fabric is only reconfigured once (one-shot kernel), and more complex ones, where it is necessary to reconfigure the CGRA multiple times to complete them (multi-shot kernels). Large kernels also benefit from the independent memory nodes incorporated to streamline data accesses. Due to the integration of CGRA as an accelerator of the RISC-V processor enables a versatile and efficient framework, providing adaptability, processing capacity, and overall performance across various applications. The design has been implemented in TSMC 65 nm, achieving a maximum frequency of 250 MHz. It achieves a peak performance of 1.22 GOPs computing one-shot kernels and 1.17 GOPs computing multi-shot kernels. The best energy efficiency is 72.68 MOPs/mW for one-shot kernels and 115.96 MOPs/mW for multi-shot kernels. The design integrates power and clock-gating techniques to tailor the architecture to the embedded domain while maintaining performance. The best speed-ups are 17.63x and 18.61x for one-shot and multi-shot kernels. The best energy savings in the SoC are 9.05x and 11.10x for one-shot and multi-shot kernels.
翻译:可重构计算在灵活性与能效之间提供了良好的平衡。当与CPU等软件可编程设备结合时,通过将应用程序中可并行化的部分空间分布到可重构器件上,同时由CPU负责控制密集型部分,能够获得更高的性能。本文提出了一种弹性粗粒度可重构架构(CGRA),该架构集成于基于RISC-V的低功耗片上系统(SoC)中,专为嵌入式领域设计。CGRA的微架构支持条件分支和不规则循环,使其能够适应特定领域应用。此外,我们提出了特定的映射策略,使得CGRA既能高效用于简单应用(仅需一次重构的“一次性内核”),也能用于更复杂应用(需多次重构才能完成的“多轮次内核”)。大型内核还可受益于集成独立存储节点以优化数据访问。由于CGRA作为RISC-V处理器的加速器集成,构建了一个灵活高效的框架,为各类应用提供了适应性、处理能力和整体性能。该设计在TSMC 65nm工艺下实现,最高工作频率达250 MHz。处理一次性内核时峰值性能为1.22 GOPs,处理多轮次内核时为1.17 GOPs。最佳能效分别为一次性内核72.68 MOPs/mW和多轮次内核115.96 MOPs/mW。设计中集成了电源门控和时钟门控技术,在保持性能的同时使架构适应嵌入式领域。一次性内核和多轮次内核的最佳加速比分别达到17.63倍和18.61倍,SoC的最佳能效提升分别为一次性内核9.05倍和多轮次内核11.10倍。