Large Language Models (LLMs) have become increasingly popular for generating RTL code. However, producing error-free RTL code in a zero-shot setting remains highly challenging for even state-of-the-art LLMs, often leading to issues that require manual, iterative refinement. This additional debugging process can dramatically increase the verification workload, underscoring the need for robust, automated correction mechanisms to ensure code correctness from the start. In this work, we introduce AIvril2, a self-verifying, LLM-agnostic agentic framework aimed at enhancing RTL code generation through iterative corrections of both syntax and functional errors. Our approach leverages a collaborative multi-agent system that incorporates feedback from error logs generated by EDA tools to automatically identify and resolve design flaws. Experimental results, conducted on the VerilogEval-Human benchmark suite, demonstrate that our framework significantly improves code quality, achieving nearly a 3.4$\times$ enhancement over prior methods. In the best-case scenario, functional pass rates of 77% for Verilog and 66% for VHDL were obtained, thus substantially improving the reliability of LLM-driven RTL code generation.
翻译:大型语言模型(LLM)在生成RTL代码方面日益普及。然而,即使在零样本设置下,即使是最先进的LLM也难以生成无错误的RTL代码,通常会导致需要手动迭代修正的问题。这一额外的调试过程会显著增加验证工作量,凸显了需要稳健的自动化校正机制以确保代码从一开始就正确。在本工作中,我们提出了AIvril2,一个自验证、与LLM无关的代理框架,旨在通过对语法和功能错误的迭代校正来增强RTL代码生成。我们的方法利用了一个协作的多代理系统,该系统整合了由EDA工具生成的错误日志反馈,以自动识别和解决设计缺陷。在VerilogEval-Human基准测试套件上进行的实验结果表明,我们的框架显著提高了代码质量,相比先前方法实现了近3.4倍的提升。在最佳情况下,Verilog和VHDL的功能通过率分别达到了77%和66%,从而大幅提高了LLM驱动的RTL代码生成的可靠性。