Artificial intelligence (AI) is increasingly deployed in real-time and energy-constrained environments, driving demand for hardware platforms that can deliver high performance and power efficiency. While central processing units (CPUs) and graphics processing units (GPUs) have traditionally served as the primary inference engines, their general-purpose nature often leads to inefficiencies under strict latency or power budgets. Field-Programmable Gate Arrays (FPGAs) offer a promising alternative by enabling custom-tailored parallelism and hardware-level optimizations. However, mapping AI workloads to FPGAs remains challenging due to the complexity of hardware-software co-design and data orchestration. This paper presents AI FPGA Agent, an agent-driven framework that simplifies the integration and acceleration of deep neural network inference on FPGAs. The proposed system employs a runtime software agent that dynamically partitions AI models, schedules compute-intensive layers for hardware offload, and manages data transfers with minimal developer intervention. The hardware component includes a parameterizable accelerator core optimized for high-throughput inference using quantized arithmetic. Experimental results demonstrate that the AI FPGA Agent achieves over 10x latency reduction compared to CPU baselines and 2-3x higher energy efficiency than GPU implementations, all while preserving classification accuracy within 0.2% of full-precision references. These findings underscore the potential of AI-FPGA co-design for scalable, energy-efficient AI deployment.
翻译:人工智能(AI)正日益部署于实时性和能耗受限的环境中,这推动了对能够提供高性能与高能效的硬件平台的需求。虽然中央处理器(CPU)和图形处理器(GPU)传统上作为主要的推理引擎,但其通用性往往在严格的延迟或功耗约束下导致效率低下。现场可编程门阵列(FPGA)通过支持定制化的并行处理和硬件级优化,提供了一种有前景的替代方案。然而,由于硬件-软件协同设计与数据编排的复杂性,将AI工作负载映射至FPGA仍然具有挑战性。本文提出了AI FPGA Agent,这是一个由智能体驱动的框架,旨在简化深度神经网络推理在FPGA上的集成与加速。所提出的系统采用一个运行时软件智能体,该智能体动态划分AI模型,调度计算密集型层进行硬件卸载,并以最少的开发者干预管理数据传输。硬件组件包含一个可参数化的加速器核心,该核心针对使用量化算术的高吞吐量推理进行了优化。实验结果表明,与CPU基线相比,AI FPGA Agent实现了超过10倍的延迟降低,并且比GPU实现提高了2-3倍的能效,同时将分类准确率保持在全精度参考模型的0.2%以内。这些发现凸显了AI-FPGA协同设计在可扩展、高能效AI部署方面的潜力。