Recent studies have illuminated that Large Language Models (LLMs) exhibit substantial potential in the realm of RTL (Register Transfer Level) code generation, with notable advancements evidenced by commercial models such as GPT-4 and Claude3-Opus. Despite their proficiency, these commercial LLMs often raise concerns regarding privacy and security. Conversely, open-source LLMs, which offer solutions to these concerns, have inferior performance in RTL code generation tasks to commercial models due to the lack of highquality open-source RTL datasets. To address this issue, we introduce OriGen, a fully open-source framework featuring self-reflection capabilities and a dataset augmentation methodology for generating high-quality, large-scale RTL code. We propose a novel code-to-code augmentation methodology that leverages knowledge distillation to enhance the quality of the open-source RTL code datasets. Additionally, OriGen is capable of correcting syntactic errors by leveraging a self-reflection process based on feedback from the compiler. The self-reflection ability of the model is facilitated by a carefully constructed dataset, which comprises a comprehensive collection of samples. Experimental results demonstrate that OriGen remarkably outperforms other open-source alternatives in RTL code generation, surpassing the previous best-performing LLM by 9.8% on the VerilogEval-Human benchmark. Furthermore, OriGen exhibits superior capabilities in self-reflection and error rectification, surpassing GPT-4 by 18.1% on the benchmark designed to evaluate the capability of self-reflection.
翻译:近期研究表明,大型语言模型(LLM)在寄存器传输级(RTL)代码生成领域展现出巨大潜力,GPT-4和Claude3-Opus等商业模型已取得显著进展。尽管这些商业LLM表现优异,但其隐私与安全问题常引发担忧。相比之下,开源LLM虽能解决此类担忧,却因缺乏高质量的开源RTL数据集,在RTL代码生成任务中的性能远逊于商业模型。为应对这一问题,我们提出了OriGen——一个具备自反思能力并采用数据集增强方法的全开源框架,用于生成高质量、大规模RTL代码。我们提出了一种新颖的代码到代码增强方法,通过知识蒸馏提升开源RTL代码数据集的质量。此外,OriGen能够基于编译器反馈的自反思过程修正语法错误。模型的自反思能力得益于精心构建的数据集,该数据集包含全面覆盖的样本集合。实验结果表明,OriGen在RTL代码生成任务中显著优于其他开源方案,在VerilogEval-Human基准测试中较先前最佳性能LLM提升9.8%。进一步地,OriGen在自反思与错误修正方面展现出卓越能力,在评估自反思能力的专项基准测试中超越GPT-4达18.1%。