AutoVeriFix+: High-Correctness RTL Generation via Trace-Aware Causal Fix and Semantic Redundancy Pruning

Large language models (LLMs) have demonstrated impressive capabilities in generating software code for high-level programming languages such as Python and C++. However, their application to hardware description languages, such as Verilog, is challenging due to the scarcity of high-quality training data. Current approaches to Verilog code generation using LLMs often focus on syntactic correctness, resulting in code with functional errors. To address these challenges, we propose AutoVeriFix+, a novel three-stage framework that integrates high-level semantic reasoning with state-space exploration to enhance functional correctness and design efficiency. In the first stage, an LLM is employed to generate high-level Python reference models that define the intended circuit behavior. In the second stage, another LLM generates initial Verilog RTL candidates and iteratively fixes syntactic errors. In the third stage, we introduce a Concolic testing engine to exercise deep sequential logic and identify corner-case vulnerabilities. With cycle-accurate execution traces and internal register snapshots, AutoVeriFix+ provides the LLM with the causal context necessary to resolve complex state-transition errors. Furthermore, it will generate a coverage report to identify functionally redundant branches, enabling the LLM to perform semantic pruning for area optimization. Experimental results demonstrate that AutoVeriFix+ achieves over 80% functional correctness on rigorous benchmarks, reaching a pass@10 score of 90.2% on the VerilogEval-machine dataset. In addition, it eliminates an average of 25% redundant logic across benchmarks through trace-aware optimization.

翻译：大语言模型（LLM）在生成Python、C++等高级编程语言的软件代码方面已展现出卓越能力。然而，由于高质量训练数据的稀缺性，其在Verilog等硬件描述语言中的应用仍面临挑战。当前基于LLM的Verilog代码生成方法通常仅关注语法正确性，导致生成的代码常存在功能错误。为解决这些问题，本文提出AutoVeriFix+——一个融合高层语义推理与状态空间探索的三阶段框架，旨在提升功能正确性与设计效率。第一阶段，利用LLM生成定义预期电路行为的高层Python参考模型。第二阶段，由另一LLM生成初始Verilog RTL候选设计并迭代修复语法错误。第三阶段，我们引入Concolic测试引擎以激活深层时序逻辑并识别边界情况漏洞。通过周期精确的执行轨迹与内部寄存器快照，AutoVeriFix+为LLM提供解决复杂状态转移错误所需的因果上下文。此外，该框架将生成覆盖率报告以识别功能冗余分支，使LLM能够执行面向面积优化的语义剪枝。实验结果表明，AutoVeriFix+在严格基准测试中实现超过80%的功能正确率，在VerilogEval-machine数据集上达到90.2%的pass@10分数。同时，通过轨迹感知优化，该框架在基准测试中平均消除25%的冗余逻辑。