The increasing complexity and high costs associated with modern processor design have led to a surge in demand for processor design automation. Instruction-tuned large language models (LLMs) have demonstrated remarkable performance in automatically generating code for general-purpose programming languages like Python. However, these methods fail on hardware description languages (HDLs) like Verilog due to the scarcity of high-quality instruction tuning data, as even advanced LLMs like GPT-3.5 exhibit limited performance on Verilog generation. Regarding this issue, we observe that (1) Verilog code collected from the real world has higher quality than those generated by LLMs. (2) LLMs like GPT-3.5 excel in summarizing Verilog code rather than generating it. Based on these observations, this paper introduces CodeV, a series of open-source instruction-tuned Verilog generation LLMs. Instead of generating descriptions first and then getting the corresponding code from advanced LLMs, we prompt the LLM with Verilog code and let the LLM generate the corresponding natural language description by multi-level summarization. Experimental results show that CodeV relatively surpasses the previous open-source SOTA by 14.4% (BetterV in VerilogEval) and 11.3% (RTLCoder in RTLLM) respectively, and also relatively outperforms previous commercial SOTA GPT-4 by 22.1% in VerilogEval.
翻译:现代处理器设计日益复杂且成本高昂,这导致对处理器设计自动化的需求激增。指令微调的大语言模型(LLMs)在自动生成通用编程语言(如Python)代码方面已展现出卓越性能。然而,这些方法在硬件描述语言(HDLs,如Verilog)上却效果不佳,原因在于高质量指令微调数据的稀缺,即使是像GPT-3.5这样的先进大语言模型,在Verilog生成任务上也表现有限。针对此问题,我们观察到:(1)从现实世界收集的Verilog代码质量高于大语言模型生成的代码。(2)像GPT-3.5这样的大语言模型更擅长总结Verilog代码而非生成它。基于这些观察,本文提出了CodeV,一系列开源的指令微调Verilog生成大语言模型。我们并非首先生成描述再从先进大语言模型获取对应代码,而是向大语言模型提供Verilog代码,并通过多级摘要让其生成对应的自然语言描述。实验结果表明,CodeV相对超越了之前的开源SOTA模型,在VerilogEval上相对BetterV提升了14.4%,在RTLLM上相对RTLCoder提升了11.3%,并且在VerilogEval上相对之前的商业SOTA模型GPT-4也相对提升了22.1%。