Driven by the increasing demand for low-latency and real-time processing, machine learning applications are steadily migrating toward edge computing platforms, where Field-Programmable Gate Arrays (FPGAs) are widely adopted for their energy efficiency compared to CPUs and GPUs. To generate high-performance and low-power FPGA designs, several frameworks built upon High Level Synthesis (HLS) vendor tools have been proposed, among which MLIR-based frameworks are gaining significant traction due to their extensibility and ease of use. However, existing state-of-the-art frameworks often overlook the stringent resource constraints of edge devices. To address this limitation, we propose MING, an Multi-Level Intermediate Representation (MLIR)-based framework that abstracts and automates the HLS design process. Within this framework, we adopt a streaming architecture with carefully managed buffers, specifically designed to handle resource constraints while ensuring low-latency. In comparison with recent frameworks, our approach achieves on average 15x speedup for standard Convolutional Neural Network (CNN) kernels with up to four layers, and up to 200x for single-layer kernels. For kernels with larger input sizes, MING is capable of generating efficient designs that respect hardware resource constraints, whereas state-of-the-art frameworks struggle to meet.
翻译:随着对低延迟和实时处理需求的日益增长,机器学习应用正稳步向边缘计算平台迁移。在现场可编程门阵列(FPGA)因其相较于CPU和GPU的能效优势而被广泛采用的背景下,为生成高性能、低功耗的FPGA设计,已有多个基于高层次综合(HLS)厂商工具构建的框架被提出,其中基于MLIR的框架因其可扩展性和易用性正获得显著关注。然而,现有的先进框架往往忽视了边缘设备严格的资源限制。为应对这一局限,我们提出了MING,一个基于多级中间表示(MLIR)的框架,该框架对HLS设计过程进行了抽象化和自动化。在此框架内,我们采用了一种具有精心管理缓冲区的流式架构,专门设计用于在确保低延迟的同时处理资源约束。与近期框架相比,我们的方法在多达四层的标准卷积神经网络(CNN)内核上平均实现了15倍的加速,在单层内核上最高可实现200倍的加速。对于具有较大输入尺寸的内核,MING能够生成尊重硬件资源约束的高效设计,而现有先进框架则难以满足这些约束。