Functional simulation is an essential step in digital hardware design. Recently, there has been a growing interest in leveraging Large Language Models (LLMs) for hardware testbench generation tasks. However, the inherent instability associated with LLMs often leads to functional errors in the generated testbenches. Previous methods do not incorporate automatic functional correction mechanisms without human intervention and still suffer from low success rates, especially for sequential tasks. To address this issue, we propose CorrectBench, an automatic testbench generation framework with functional self-validation and self-correction. Utilizing only the RTL specification in natural language, the proposed approach can validate the correctness of the generated testbenches with a success rate of 88.85%. Furthermore, the proposed LLM-based corrector employs bug information obtained during the self-validation process to perform functional self-correction on the generated testbenches. The comparative analysis demonstrates that our method achieves a pass ratio of 70.13% across all evaluated tasks, compared with the previous LLM-based testbench generation framework's 52.18% and a direct LLM-based generation method's 33.33%. Specifically in sequential circuits, our work's performance is 62.18% higher than previous work in sequential tasks and almost 5 times the pass ratio of the direct method. The codes and experimental results are open-sourced at the link: https://github.com/AutoBench/CorrectBench
翻译:功能仿真是数字硬件设计中的关键环节。近年来,利用大语言模型(LLMs)进行硬件测试平台生成任务的研究日益增多。然而,LLMs固有的不稳定性常导致生成的测试平台存在功能错误。现有方法缺乏无需人工干预的自动功能校正机制,且成功率较低,在时序任务中尤为明显。为解决这一问题,我们提出CorrectBench——一种具备功能自验证与自校正能力的自动测试平台生成框架。该方法仅需自然语言描述的RTL规范,即可对生成的测试平台进行正确性验证,成功率可达88.85%。此外,所提出的基于LLM的校正器利用自验证过程中获取的错误信息,对生成的测试平台执行功能自校正。对比分析表明,本方法在所有评估任务中通过率达到70.13%,而此前基于LLM的测试平台生成框架为52.18%,直接基于LLM的生成方法仅为33.33%。特别在时序电路方面,本方法在时序任务中的性能较先前工作提升62.18%,通过率约为直接方法的5倍。代码与实验结果已通过以下链接开源:https://github.com/AutoBench/CorrectBench