FPGAs offer high performance, low latency, and energy efficiency for accelerated computing, yet adoption in scientific and edge settings is limited by the specialized hardware expertise required. High-level synthesis (HLS) boosts productivity over HDLs, but competitive designs still demand hardware-aware optimizations and careful dataflow design. We introduce LAAFD, an agentic workflow that uses large language models to translate general-purpose C++ into optimized Vitis HLS kernels. LAAFD automates key transfor mations: deep pipelining, vectorization, and dataflow partitioning and closes the loop with HLS co-simulation and synthesis feedback to verify correctness while iteratively improving execution time in cycles. Over a suite of 15 kernels representing common compute patterns in HPC, LAFFD achieves 99.9% geomean performance when compared to the hand tuned baseline for Vitis HLS. For stencil workloads, LAAFD matches the performance of SODA, a state-of-the-art DSL-based HLS code generator for stencil solvers, while yielding more readable kernels. These results suggest LAAFD substantially lowers the expertise barrier to FPGA acceleration without sacrificing efficiency.
翻译:现场可编程门阵列(FPGA)为加速计算提供了高性能、低延迟和高能效的优势,但其在科学计算与边缘计算场景中的应用仍受限于所需的专业硬件知识。高层次综合(HLS)相比硬件描述语言(HDL)提升了开发效率,但要实现具有竞争力的设计仍需硬件感知优化与精细的数据流设计。本文提出LAAFD——一种基于大语言模型的智能体工作流,能够将通用C++代码自动转换为经过优化的Vitis HLS内核。LAAFD自动化实现了关键转换:深度流水线化、向量化及数据流分区,并通过HLS协同仿真与综合反馈形成闭环,在验证功能正确性的同时,迭代优化以周期数衡量的执行时间。在涵盖高性能计算中15种典型计算模式的内核测试集上,LAAFD相较于人工调优的Vitis HLS基准方案,实现了99.9%的几何平均性能保持率。针对模板计算负载,LAAFD在性能上可与基于领域专用语言(DSL)的最先进HLS模板求解器代码生成工具SODA相媲美,同时生成更具可读性的内核代码。这些结果表明,LAAFD在保持高效能的同时,显著降低了FPGA加速开发的专业门槛。