Large Language Models (LLMs) show promise for generating Register-Transfer Level (RTL) code from natural language specifications, but single-shot generation achieves only 60-65% functional correctness on standard benchmarks. Multi-agent approaches such as MAGE reach 95.9% on VerilogEval yet remain untested on harder industrial benchmarks such as NVIDIA's CVDP, lack synthesis awareness, and incur high API costs. We present ChipCraftBrain, a framework combining symbolic-neural reasoning with adaptive multi-agent orchestration for automated RTL generation. Four innovations drive the system: (1) adaptive orchestration over six specialized agents via a PPO policy over a 168-dim state (an alternative world-model MPC planner is also evaluated); (2) a hybrid symbolic-neural architecture that solves K-map and truth-table problems algorithmically while specialized agents handle waveform timing and general RTL; (3) knowledge-augmented generation from a 321-pattern base plus 971 open-source reference implementations with focus-aware retrieval; and (4) hierarchical specification decomposition into dependency-ordered sub-modules with interface synchronization. On VerilogEval-Human, ChipCraftBrain achieves 97.2% mean pass@1 (range 96.15-98.72% across 7 runs, best 154/156), on par with ChipAgents (97.4%, self-reported) and ahead of MAGE (95.9%). On a 302-problem non-agentic subset of CVDP spanning five task categories, we reach 94.7% mean pass@1 (286/302, averaged over 3 runs), a 36-60 percentage-point lift per category over the published single-shot baseline; we additionally lead three of four categories shared with NVIDIA's ACE-RTL despite using roughly 30x fewer per-problem attempts. A RISC-V SoC case study demonstrates hierarchical decomposition generating 8/8 lint-passing modules (689 LOC) validated on FPGA, where monolithic generation fails entirely.
翻译:大规模语言模型(LLMs)虽展现出从自然语言规范生成寄存器传输级(RTL)代码的潜力,但单次生成在标准基准测试中仅能达到60-65%的功能正确性。MAGE等多智能体方法在VerilogEval上虽达到95.9%的正确率,却未在NVIDIA CVDP等更严苛的工业基准上测试,且缺乏综合感知能力,同时伴随高昂的API成本。我们提出ChipCraftBrain框架,将符号-神经推理与自适应多智能体编排相结合,实现自动化RTL生成。该系统包含四项创新:(1)通过基于168维状态空间的PPO策略(另评估了替代世界模型MPC规划器)对六个专业智能体进行自适应编排;(2)混合符号-神经架构:对K-map和真值表问题进行算法求解,同时由专业智能体处理波形时序与通用RTL;(3)知识增强生成:基于321种模式基础库与971个开源参考实现,结合焦点感知检索技术;(4)层级化规范分解:将设计拆解为依赖排序的子模块,并实现接口同步。在VerilogEval-Human基准上,ChipCraftBrain达到平均97.2%的pass@1(7次运行区间96.15-98.72%,最优154/156),与ChipAgents(97.4%,自报结果)持平,超越MAGE(95.9%)。在涵盖五类设计任务的CVDP非智能体子集(含302个问题)上,我们获得平均94.7%的pass@1(286/302,三次运行平均),每类任务较已发表的单次生成基线提升36-60个百分点;此外,在与NVIDIA ACE-RTL共享的四类任务中,我们领先其中三类,且每问题尝试次数减少约30倍。RISC-V SoC案例研究表明,层级化解构生成8个通过lint检查的模块(689行代码),并在FPGA上验证通过,而整体式生成则完全失败。