The increasing complexity and high costs associated with modern processor design have led to a surge in demand for processor design automation. Instruction-tuned large language models (LLMs) have demonstrated remarkable performance in automatically generating code for general-purpose programming languages like Python. However, these methods fail on hardware description languages (HDLs) like Verilog due to the scarcity of high-quality instruction tuning data, as even advanced LLMs like GPT-3.5 exhibit limited performance on Verilog generation. Regarding this issue, we observe that (1) Verilog code collected from the real world has higher quality than those generated by LLMs. (2) LLMs like GPT-3.5 excel in summarizing Verilog code rather than generating it. Based on these observations, this paper introduces CodeV, a series of open-source instruction-tuned Verilog generation LLMs. Instead of generating descriptions first and then getting the corresponding code from advanced LLMs, we prompt the LLM with Verilog code and let the LLM generate the corresponding natural language description by multi-level summarization. Experimental results show that CodeV relatively surpasses the previous open-source SOTA by 14.4% (BetterV in VerilogEval) and 11.3% (RTLCoder in RTLLM) respectively, and also relatively outperforms previous commercial SOTA GPT-4 by 22.1% in VerilogEval.
翻译:现代处理器设计日益增长的复杂性和高昂成本,导致对处理器设计自动化的需求激增。指令微调的大语言模型(LLMs)在自动生成通用编程语言(如Python)代码方面已展现出卓越性能。然而,这些方法在硬件描述语言(HDLs,如Verilog)上却表现不佳,主要原因是高质量的指令微调数据稀缺,即使是像GPT-3.5这样的先进大语言模型,在Verilog生成任务上的性能也有限。针对此问题,我们观察到:(1)从现实世界收集的Verilog代码质量高于大语言模型生成的代码。(2)像GPT-3.5这样的大语言模型更擅长总结Verilog代码而非生成它。基于这些观察,本文提出了CodeV,一系列开源的、经过指令微调的Verilog生成大语言模型。我们不再采用先生成描述、再从先进大语言模型获取对应代码的方法,而是向大语言模型输入Verilog代码,通过多级摘要让其生成对应的自然语言描述。实验结果表明,CodeV在VerilogEval基准上相对先前开源SOTA(BetterV)提升了14.4%,在RTLLM基准上相对先前开源SOTA(RTLCoder)提升了11.3%,并且在VerilogEval基准上相对先前商业SOTA GPT-4也提升了22.1%。