The automatic generation of RTL code (e.g., Verilog) through natural language instructions has emerged as a promising direction with the advancement of large language models (LLMs). However, producing RTL code that is both syntactically and functionally correct remains a significant challenge. Existing single-LLM-agent approaches face substantial limitations because they must navigate between various programming languages and handle intricate generation, verification, and modification tasks. To address these challenges, this paper introduces MAGE, the first open-source multi-agent AI system designed for robust and accurate Verilog RTL code generation. We propose a novel high-temperature RTL candidate sampling and debugging system that effectively explores the space of code candidates and significantly improves the quality of the candidates. Furthermore, we design a novel Verilog-state checkpoint checking mechanism that enables early detection of functional errors and delivers precise feedback for targeted fixes, significantly enhancing the functional correctness of the generated RTL code. MAGE achieves a 95.7% rate of syntactic and functional correctness code generation on VerilogEval-Human 2 benchmark, surpassing the state-of-the-art Claude-3.5-sonnet by 23.3 %, demonstrating a robust and reliable approach for AI-driven RTL design workflows.
翻译:随着大型语言模型(LLM)的进步,通过自然语言指令自动生成RTL代码(例如Verilog)已成为一个前景广阔的研究方向。然而,生成在语法和功能上均正确的RTL代码仍然是一个重大挑战。现有的单LLM智能体方法面临显著局限,因为它们必须在多种编程语言之间切换,并处理复杂的生成、验证与修改任务。为应对这些挑战,本文介绍了MAGE,这是首个专为稳健且准确的Verilog RTL代码生成而设计的开源多智能体人工智能系统。我们提出了一种新颖的高温RTL候选代码采样与调试系统,该系统能有效探索候选代码空间并显著提升候选代码质量。此外,我们设计了一种新颖的Verilog状态检查点检查机制,该机制能够实现功能错误的早期检测,并为针对性修复提供精确反馈,从而显著提高生成RTL代码的功能正确性。在VerilogEval-Human 2基准测试中,MAGE实现了95.7%的语法与功能正确代码生成率,超越了当前最先进的Claude-3.5-sonnet模型23.3%,为人工智能驱动的RTL设计工作流程提供了一种稳健可靠的方法。