The scientific community increasingly relies on machine learning (ML) for near-sensor processing, leveraging its strengths in tasks such as pattern recognition, anomaly detection, and real-time decision-making. These deployments demand accelerators that combine extremely high performance with programmability, ease of integration, and straightforward verification. We present cgra4ml, an open-source, modular framework that generates parameterizable CGRA accelerators in synthesizable SystemVerilog RTL, tailored to common ML compute patterns found in scientific applications. The framework supports seamless system integration through AXI-compliant interfaces and open-source DMA components, and it includes automatic firmware generation for programming the accelerator. A comprehensive verification suite and a runtime firmware stack further support deployment across diverse SoC platforms. cgra4ml provides a modular, full-stack infrastructure, including a Python API, SystemVerilog hardware, TCL toolflows, and a C runtime, which facilitates easy integration and experimentation, allowing scientists to focus on innovation rather than dealing with the intricacies of hardware design and optimization. We demonstrate the effectiveness of cgra4ml to implement common scientific edge neural networks using ASIC and FPGA design flows.
翻译:科学界日益依赖机器学习(ML)进行近传感器处理,利用其在模式识别、异常检测和实时决策等任务中的优势。此类部署要求加速器兼具极高性能、可编程性、易于集成和简洁的验证流程。本文提出cgra4ML——一个开源的模块化框架,能够生成可参数化的CGRA加速器,其采用可综合的SystemVerilog RTL实现,专门针对科学应用中常见的ML计算模式进行定制。该框架通过符合AXI标准的接口和开源DMA组件支持无缝系统集成,并包含用于编程加速器的自动固件生成功能。全面的验证套件和运行时固件栈进一步支持跨多种SoC平台的部署。cgra4ML提供模块化的全栈基础设施,包括Python API、SystemVerilog硬件、TCL工具流和C运行时环境,这显著简化了集成与实验流程,使科研人员能够专注于创新而非应对硬件设计与优化的复杂性。我们通过ASIC和FPGA设计流程展示了cgra4ML在实现常见科学边缘神经网络方面的有效性。