Large Language Models (LLMs) have demonstrated potential in assisting with Register Transfer Level (RTL) design tasks. Nevertheless, there remains to be a significant gap in benchmarks that accurately reflect the complexity of real-world RTL projects. To address this, this paper presents RTL-Repo, a benchmark specifically designed to evaluate LLMs on large-scale RTL design projects. RTL-Repo includes a comprehensive dataset of more than 4000 Verilog code samples extracted from public GitHub repositories, with each sample providing the full context of the corresponding repository. We evaluate several state-of-the-art models on the RTL-Repo benchmark, including GPT-4, GPT-3.5, Starcoder2, alongside Verilog-specific models like VeriGen and RTLCoder, and compare their performance in generating Verilog code for complex projects. The RTL-Repo benchmark provides a valuable resource for the hardware design community to assess and compare LLMs' performance in real-world RTL design scenarios and train LLMs specifically for Verilog code generation in complex, multi-file RTL projects. RTL-Repo is open-source and publicly available on Github.
翻译:大语言模型(LLMs)在辅助寄存器传输级(RTL)设计任务方面已展现出潜力。然而,目前仍缺乏能够准确反映实际RTL项目复杂性的基准测试。为此,本文提出了RTL-Repo,这是一个专门为评估大语言模型在大型RTL设计项目上的性能而设计的基准。RTL-Repo包含一个从公开GitHub仓库中提取的、超过4000个Verilog代码样本的综合性数据集,每个样本都提供了对应仓库的完整上下文。我们在RTL-Repo基准上评估了多个前沿模型,包括GPT-4、GPT-3.5、Starcoder2,以及Verilog专用模型如VeriGen和RTLCoder,并比较了它们在为复杂项目生成Verilog代码方面的性能。RTL-Repo基准为硬件设计社区提供了一个宝贵的资源,用于评估和比较大语言模型在实际RTL设计场景中的性能,并专门训练大语言模型以用于复杂、多文件的RTL项目中的Verilog代码生成。RTL-Repo是开源的,可在GitHub上公开获取。