The scientific community increasingly relies on machine learning (ML) for near-sensor processing, leveraging its strengths in tasks such as pattern recognition, anomaly detection, and real-time decision-making. These deployments demand accelerators that combine extremely high performance with programmability, ease of integration, and straightforward verification. We present cgra4ml, an open-source, modular framework that generates parameterizable CGRA accelerators in synthesizable SystemVerilog RTL, tailored to common ML compute patterns found in scientific applications. The framework supports seamless system integration through AXI-compliant interfaces and open-source DMA components, and it includes automatic firmware generation for programming the accelerator. A comprehensive verification suite and a runtime firmware stack further support deployment across diverse SoC platforms. cgra4ml provides a modular, full-stack infrastructure, including a Python API, SystemVerilog hardware, TCL toolflows, and a C runtime, which facilitates easy integration and experimentation, allowing scientists to focus on innovation rather than dealing with the intricacies of hardware design and optimization. We demonstrate the effectiveness of cgra4ml to implement common scientific edge neural networks using ASIC and FPGA design flows.
翻译:科学界日益依赖机器学习(ML)进行近传感器处理,发挥其在模式识别、异常检测和实时决策等任务中的优势。这些部署需要兼具极高性能、可编程性、易于集成和便捷验证的加速器。我们提出cgra4ml——一个开源模块化框架,可在可综合SystemVerilog RTL中生成参数化的CGRA加速器,并针对科学应用中常见的ML计算模式进行定制。该框架通过兼容AXI的接口和开源DMA组件支持无缝系统集成,并包含用于编程加速器的自动固件生成。一套全面的验证套件和运行时固件栈进一步支持其在多样化SoC平台上的部署。cgra4ml提供模块化全栈基础设施,包括Python API、SystemVerilog硬件、TCL工具流和C运行时库,便于集成和实验,使科学家能够专注于创新而非应对硬件设计与优化的复杂性。我们通过ASIC和FPGA设计流程展示了cgra4ml在实现常见科学边缘神经网络方面的有效性。