Inspired by the recent success of large language models (LLMs) like ChatGPT, researchers start to explore the adoption of LLMs for agile hardware design, such as generating design RTL based on natural-language instructions. However, in existing works, their target designs are all relatively simple and in a small scale, and proposed by the authors themselves, making a fair comparison among different LLM solutions challenging. In addition, many prior works only focus on the design correctness, without evaluating the design qualities of generated design RTL. In this work, we propose an open-source benchmark named RTLLM, for generating design RTL with natural language instructions. To systematically evaluate the auto-generated design RTL, we summarized three progressive goals, named syntax goal, functionality goal, and design quality goal. This benchmark can automatically provide a quantitative evaluation of any given LLM-based solution. Furthermore, we propose an easy-to-use yet surprisingly effective prompt engineering technique named self-planning, which proves to significantly boost the performance of GPT-3.5 in our proposed benchmark.
翻译:受ChatGPT等大型语言模型(LLMs)近期成功的启发,研究者开始探索将LLMs应用于敏捷硬件设计,例如基于自然语言指令生成设计级RTL。然而,现有工作中目标设计均较为简单且规模较小,且由作者自行提出,导致不同LLM方案间的公平比较存在困难。此外,许多前期工作仅关注设计正确性,未评估所生成RTL的设计质量。本文提出名为RTLLM的开源基准测试,用于通过自然语言指令生成设计RTL。为系统评估自动生成的RTL,我们归纳出三个递进目标:语法目标、功能目标和设计质量目标。该基准测试可自动对任意基于LLM的方案进行定量评估。此外,我们提出一种易用且效果显著的提示工程技巧——自规划,经证明该技术可显著提升GPT-3.5在本基准测试中的性能。