HAVEN: Hybrid Automated Verification ENgine for UVM Testbench Synthesis with LLMs

Integrated Circuit (IC) verification consumes nearly 70% of the IC development cycle, and recent research leverages Large Language Models (LLMs) to automatically generate testbenches and reduce verification overhead. However, LLMs have difficulty generating testbenches correctly. Unlike high-level programming languages, Hardware Description Languages (HDLs) are extremely rare in LLMs training data, leading LLMs to produce incorrect code. To overcome challenges when using LLMs to generate Universal Verification Methodology (UVM) testbenches and sequences, wepropose HAVEN (Hybrid Automated Verification ENgine) to prevent LLMs from writing HDL directly. For UVM testbench generation, HAVEN utilizes LLM agents to analyze design specifications to produce a structured architectural plan. The HAVEN Template Engine then combines with predefined and protocol-specific templates to generate all UVM components with correct bus-handshake timing. For UVM sequence generation, HAVEN introduces a Protocol-Aware Sequence Domain-Specific Language (DSL) that decomposes sequences into fine-grained step types. A set of predefined DSL patterns first establishes sequences that achieve a high coverage rate without LLM involvement. HAVEN continues to improve the coverage rate by iteratively leveraging LLM agents to analyze coverage gap reports and compose additional targeted DSL sequences. Unlike previous works, HAVEN is the first system that utilizes pre-defined, protocol-specific Jinja2 templates to generate all UVM components and UVM sequences using our proposed Protocol-Aware DSL and rule-based code generator. Our experimental results on 19 open-source IP designs spanning three interface protocols (Direct, Wishbone, AXI4-Lite) show that HAVEN achieves 100% compilation success, 90.6% code coverage, and 87.9% functional coverage on average, and is SOTA among LLM-assisted testbench generation systems.

翻译：集成电路验证消耗了近70%的芯片开发周期，近期研究利用大型语言模型自动生成测试平台以减少验证开销。然而，大语言模型难以正确生成测试平台。与高级编程语言不同，硬件描述语言在大语言模型训练数据中极为罕见，导致模型产生错误代码。为解决使用大语言模型生成通用验证方法学测试平台和序列时的挑战，我们提出HAVEN（混合自动验证引擎），阻止大语言模型直接编写硬件描述语言。对于UVM测试平台生成，HAVEN利用大语言模型智能体分析设计规格，生成结构化架构方案。HAVEN模板引擎随后结合预定义模板与协议特定模板，生成所有具有正确总线握手机制的UVM组件。对于UVM序列生成，HAVEN引入协议感知序列领域特定语言，将序列分解为细粒度步骤类型。一组预定义领域特定语言模式首先在不依赖大语言模型的情况下建立高覆盖率序列。HAVEN通过迭代利用大语言模型智能体分析覆盖率缺口报告并组合额外的目标领域特定语言序列，持续提升覆盖率。与现有工作不同，HAVEN是首个利用预定义协议特定Jinja2模板，通过我们提出的协议感知领域特定语言与基于规则的代码生成器生成所有UVM组件与UVM序列的系统。我们在涵盖三种接口协议（Direct、Wishbone、AXI4-Lite）的19个开源IP设计上的实验结果表明，HAVEN实现了平均100%编译成功率、90.6%代码覆盖率及87.9%功能覆盖率，在大语言模型辅助测试平台生成系统中达到当前最优水平。