With the unprecedented advancements in Large Language Models (LLMs), their application domains have expanded to include code generation tasks across various programming languages. While significant progress has been made in enhancing LLMs for popular programming languages, there exists a notable gap in comprehensive evaluation frameworks tailored for Hardware Description Languages (HDLs), particularly VHDL. This paper addresses this gap by introducing a comprehensive evaluation framework designed specifically for assessing LLM performance in VHDL code generation task. We construct a dataset for evaluating LLMs on VHDL code generation task. This dataset is constructed by translating a collection of Verilog evaluation problems to VHDL and aggregating publicly available VHDL problems, resulting in a total of 202 problems. To assess the functional correctness of the generated VHDL code, we utilize a curated set of self-verifying testbenches specifically designed for those aggregated VHDL problem set. We conduct an initial evaluation of different LLMs and their variants, including zero-shot code generation, in-context learning (ICL), and Parameter-efficient fine-tuning (PEFT) methods. Our findings underscore the considerable challenges faced by existing LLMs in VHDL code generation, revealing significant scope for improvement. This study emphasizes the necessity of supervised fine-tuning code generation models specifically for VHDL, offering potential benefits to VHDL designers seeking efficient code generation solutions.
翻译:随着大语言模型(LLMs)取得前所未有的进展,其应用领域已扩展到涵盖多种编程语言的代码生成任务。尽管在提升LLMs面向主流编程语言的能力方面已取得显著进展,但针对硬件描述语言(HDLs),尤其是VHDL的综合性评估框架仍存在明显空白。本文通过引入一个专门为评估LLMs在VHDL代码生成任务中的性能而设计的综合性评估框架,以填补这一空白。我们构建了一个用于评估LLMs在VHDL代码生成任务上的数据集。该数据集通过将一系列Verilog评估问题转换为VHDL,并整合公开可用的VHDL问题而构建,共计包含202个问题。为评估所生成VHDL代码的功能正确性,我们使用了一套针对该聚合VHDL问题集专门设计的自验证测试平台。我们对不同LLMs及其变体进行了初步评估,包括零样本代码生成、上下文学习(ICL)以及参数高效微调(PEFT)方法。我们的研究结果突显出现有LLMs在VHDL代码生成方面面临的巨大挑战,揭示了其存在显著的改进空间。本研究强调了专门针对VHDL进行监督微调代码生成模型的必要性,为寻求高效代码生成解决方案的VHDL设计者提供了潜在益处。