Inspired by the recent success of large language models (LLMs) like ChatGPT, researchers start to explore the adoption of LLMs for agile hardware design, such as generating design RTL based on natural-language instructions. However, in existing works, their target designs are all relatively simple and in a small scale, and proposed by the authors themselves, making a fair comparison among different LLM solutions challenging. In addition, many prior works only focus on the design correctness, without evaluating the design qualities of generated design RTL. In this work, we propose an open-source benchmark named RTLLM, for generating design RTL with natural language instructions. To systematically evaluate the auto-generated design RTL, we summarized three progressive goals, named syntax goal, functionality goal, and design quality goal. This benchmark can automatically provide a quantitative evaluation of any given LLM-based solution. Furthermore, we propose an easy-to-use yet surprisingly effective prompt engineering technique named self-planning, which proves to significantly boost the performance of GPT-3.5 in our proposed benchmark.
翻译:受ChatGPT等大语言模型(LLMs)近期成功案例的启发,研究者开始探索将LLMs应用于敏捷硬件设计,例如基于自然语言指令生成设计级RTL。然而,现有工作中目标设计均相对简单且规模较小,且由作者自行提出,导致不同LLM解决方案之间难以进行公平比较。此外,多数前期工作仅关注设计正确性,未评估生成RTL的设计质量。本文提出名为RTLLM的开源基准测试,用于通过自然语言指令生成设计RTL。为系统评估自动生成的RTL,我们归纳出三个递进目标:语法目标、功能目标与设计质量目标。该基准测试可自动对任意基于LLM的解决方案进行量化评估。进一步地,我们提出一种易于使用但效果显著的提示工程技术——自我规划,该技术能够显著提升GPT-3.5在我们所提基准测试中的表现。