In last two years, large language models (LLMs) have shown strong capabilities in code generation, including hardware design at register-transfer level (RTL). While their use in high-level synthesis (HLS) remains comparatively less mature, the ratio of HLS- to RTL-focused studies has shifted from 1:10 to 2:10 in the past six months, indicating growing interest in leveraging LLMs for high-level design entry while relying on downstream synthesis for optimization. This growing trend highlights the need for a comprehensive benchmarking and evaluation framework dedicated to LLM-based HLS. To address this, We present Bench4HLS for evaluating LLM-generated HLS designs. Bench4HLS comprises 170 manually drafted and validated case studies, spanning small kernels to complex accelerators, curated from widely used public repositories. The framework supports fully automated assessment of compilation success, functional correctness via simulation, and synthesis feasibility/optimization. Crucially, Bench4HLS integrates a pluggable API for power, performance, and area (PPA) analysis across various HLS toolchains and architectures, demonstrated here with Xilinx Vitis HLS and validated on Catapult HLS. By providing a structured, extensible, and plug-and-play testbed, Bench4HLS establishes a foundational methodology for benchmarking LLMs in HLS workflows.
翻译:过去两年间,大语言模型(LLMs)在代码生成领域展现出强大能力,包括寄存器传输级(RTL)的硬件设计。尽管其在高层次综合(HLS)中的应用相对尚未成熟,但过去六个月中聚焦HLS与RTL的研究比例已从1:10转变为2:10,这表明业界对利用LLM进行高层次设计输入、同时依赖下游综合工具进行优化的兴趣日益增长。这一趋势凸显了亟需建立专门针对基于LLM的HLS的综合性基准测试与评估体系。为此,我们提出Bench4HLS框架,用于评估LLM生成的HLS设计。该框架包含170个手动编写且经过验证的案例研究,涵盖从小型内核到复杂加速器的多种设计,案例均选自广泛使用的公共代码库。本框架支持对编译成功率、通过仿真验证的功能正确性以及综合可行性/优化效果进行全自动化评估。尤为关键的是,Bench4HLS集成了可插拔API,支持跨多种HLS工具链和架构的功耗、性能与面积(PPA)分析,本文以Xilinx Vitis HLS为例进行演示,并在Catapult HLS平台上完成验证。通过提供结构化、可扩展且即插即用的测试平台,Bench4HLS为HLS工作流中的LLM基准测试建立了基础性方法论。