The increasing popularity of large language models (LLMs) has paved the way for their application in diverse domains. This paper proposes a benchmarking framework tailored specifically for evaluating LLM performance in the context of Verilog code generation for hardware design and verification. We present a comprehensive evaluation dataset consisting of 156 problems from the Verilog instructional website HDLBits. The evaluation set consists of a diverse set of Verilog code generation tasks, ranging from simple combinational circuits to complex finite state machines. The Verilog code completions can be automatically tested for functional correctness by comparing the transient simulation outputs of the generated design with a golden solution. We also demonstrate that the Verilog code generation capability of pretrained language models could be improved with supervised fine-tuning by bootstrapping with LLM generated synthetic problem-code pairs.
翻译:大型语言模型(LLM)的日益普及为其在多个领域的应用铺平了道路。本文提出一个专门用于评估LLM在硬件设计与验证中Verilog代码生成性能的基准测试框架。我们构建了一个包含156个问题的综合评估数据集,这些问题源自Verilog教学网站HDLBits。评估集涵盖从简单组合电路到复杂有限状态机等多种Verilog代码生成任务。通过将生成设计的瞬态仿真输出与黄金解决方案进行比较,可自动测试Verilog代码补全的功能正确性。我们还证明,通过使用LLM生成的合成问题-代码对进行引导式监督微调,能够提升预训练语言模型的Verilog代码生成能力。