The increasing popularity of large language models (LLMs) has paved the way for their application in diverse domains. This paper proposes a benchmarking framework tailored specifically for evaluating LLM performance in the context of Verilog code generation for hardware design and verification. We present a comprehensive evaluation dataset consisting of 156 problems from the Verilog instructional website HDLBits. The evaluation set consists of a diverse set of Verilog code generation tasks, ranging from simple combinational circuits to complex finite state machines. The Verilog code completions can be automatically tested for functional correctness by comparing the transient simulation outputs of the generated design with a golden solution. We also demonstrate that the Verilog code generation capability of pretrained language models could be improved with supervised fine-tuning by bootstrapping with LLM generated synthetic problem-code pairs.
翻译:大型语言模型(LLMs)的日益普及为其在多个领域的应用奠定了基础。本文提出一个专为评估LLM在硬件设计与验证领域的Verilog代码生成性能而设计的基准测试框架。我们提供一个包含156个问题的综合评估数据集,这些题目源于Verilog教学网站HDLBits。评估集涵盖从简单组合电路到复杂有限状态机的多样化Verilog代码生成任务。通过将生成设计的瞬态仿真输出与黄金方案进行比对,可自动测试Verilog代码补全的功能正确性。我们还证明,通过使用LLM生成的合成问题-代码对进行自举监督微调,能够提升预训练语言模型的Verilog代码生成能力。