In the rapidly evolving field of Electronic Design Automation (EDA), the deployment of Large Language Models (LLMs) for Register-Transfer Level (RTL) design has emerged as a promising direction. However, silicon-grade correctness remains bottlenecked by: (i) limited test coverage and reliability of simulation-centric evaluation, (ii) regressions and repair hallucinations introduced by iterative debugging, and (iii) semantic drift as intent is reinterpreted across agent handoffs. In this work, we propose Veri-Sure, a multi-agent framework that establishes a design contract to align agents' intent and uses a patching mechanism guided by static dependency slicing to perform precise, localized repairs. By integrating a multi-branch verification pipeline that combines trace-driven temporal analysis with formal verification consisting of assertion-based checking and boolean equivalence proofs, Veri-Sure enables functional correctness beyond pure simulations. We also introduce VerilogEval-v2-EXT, extending the original benchmark with 53 more industrial-grade design tasks and stratified difficulty levels, and show that Veri-Sure achieves state-of-the-art verified-correct RTL code generation performance, surpassing standalone LLMs and prior agentic systems.
翻译:在快速发展的电子设计自动化(EDA)领域,利用大语言模型(LLM)进行寄存器传输级(RTL)设计已成为一个前景广阔的方向。然而,实现硅级正确性仍面临以下瓶颈:(i)以仿真为中心的评估方法测试覆盖有限且可靠性不足;(ii)迭代调试过程中引入的回归与修复幻觉;(iii)智能体任务交接时因意图被重新诠释而产生的语义漂移。本研究提出Veri-Sure,一个多智能体框架,通过建立设计合约来对齐各智能体的意图,并利用基于静态依赖切片的修补机制执行精确的局部修复。通过集成一个结合了时序驱动追踪分析与形式化验证(包括基于断言的检查与布尔等价证明)的多分支验证流程,Veri-Sure实现了超越纯仿真的功能正确性保证。我们还推出了VerilogEval-v2-EXT基准测试集,在原有基础上新增了53个工业级设计任务并设置了分层难度等级。实验表明,Veri-Sure在已验证正确的RTL代码生成任务上取得了最先进的性能,超越了独立的大语言模型及先前的智能体系统。