Inspired by the recent success of large language models (LLMs) like ChatGPT, researchers start to explore the adoption of LLMs for agile hardware design, such as generating design RTL based on natural-language instructions. However, in existing works, their target designs are all relatively simple and in a small scale, and proposed by the authors themselves, making a fair comparison among different LLM solutions challenging. In addition, many prior works only focus on the design correctness, without evaluating the design qualities of generated design RTL. In this work, we propose an open-source benchmark named RTLLM, for generating design RTL with natural language instructions. To systematically evaluate the auto-generated design RTL, we summarized three progressive goals, named syntax goal, functionality goal, and design quality goal. This benchmark can automatically provide a quantitative evaluation of any given LLM-based solution. Furthermore, we propose an easy-to-use yet surprisingly effective prompt engineering technique named self-planning, which proves to significantly boost the performance of GPT-3.5 in our proposed benchmark.
翻译:受ChatGPT等大语言模型近期成功经验的启发,研究者开始探索将LLM应用于敏捷硬件设计,例如基于自然语言指令生成设计RTL。然而现有工作均聚焦于作者自行提出的相对简单的小规模设计案例,导致不同LLM解决方案难以进行公平比较。此外,多数前期工作仅关注设计正确性,缺乏对生成RTL设计质量的评估。本研究提出名为RTLLM的开源基准测试,用于通过自然语言指令生成设计RTL。为系统评估自动生成的设计RTL,我们总结了三个渐进式目标:语法目标、功能目标与设计质量目标。该基准测试可自动对任意基于LLM的解决方案进行量化评估。此外,我们提出一种简洁高效且效果惊人的提示工程技巧——自规划法,该方法被证明能显著提升GPT-3.5在所提基准测试中的性能表现。