RTL program repair remains a critical bottleneck in hardware design and verification. Traditional automatic program repair (APR) methods rely on predefined templates and synthesis, limiting their bug coverage. Large language models (LLMs) and coding agents based on them offer flexibility but suffer from randomness and context corruption when handling long RTL code and waveforms. We present Clover, a neural-symbolic agentic harness that orchestrates RTL repair as a structured search over code manipulations to explore a validated solution for the bug. Recognizing that different repair operations favor distinct strategies, Clover dynamically dispatches tasks to specialized LLM agents or symbolic solvers. At its core, Clover introduces stochastic tree-of-thoughts, a test-time scaling mechanism that manages the main agent's context as a search tree, balancing exploration and exploitation for reliable outcomes. An RTL-specific toolbox further empowers agents to interact with the debugging environment. Evaluated on the RTL-repair benchmark, Clover fixes 96.8% of bugs within a fixed time limit, covering 94% and 63% more bugs than both pure traditional and LLM-based baselines, respectively, while achieving an average pass@1 rate of 87.5%, demonstrating high reliability and effectiveness.
翻译:RTL程序修复仍然是硬件设计与验证中的关键瓶颈。传统自动程序修复(APR)方法依赖预定义模板和综合,限制了其错误覆盖范围。基于大规模语言模型(LLM)及其编码代理虽然提供了灵活性,但在处理长RTL代码和波形时存在随机性和上下文损坏问题。我们提出Clover,一种神经符号化智能框架,将RTL修复组织为对代码操作的结构化搜索,从而探索错误的已验证解决方案。鉴于不同的修复操作偏好不同策略,Clover动态地将任务分配给专门的LLM代理或符号求解器。其核心是引入随机思维树,这是一种测试时扩展机制,将主代理的上下文管理为搜索树,平衡探索与利用以获得可靠结果。一个RTL专用工具箱进一步增强了代理与调试环境交互的能力。在RTL修复基准测试中,Clover在固定时间限制内修复了96.8%的错误,分别比纯传统方法和基于LLM的基准多覆盖94%和63%的错误,同时平均pass@1率达到87.5%,展现了高可靠性和有效性。