ChipMATE: Multi-Agent Training via Reinforcement Learning for Enhanced RTL Generation

Zhongkai Yu,Yichen Lin,Chenyang Zhou,Yuwei Zhang,Kun Zhou,Junxia Cui,Haotian Ye,Zhengding Hu,Zaifeng Pan,Ruiyi Wang,Yujie Zhao,Hejia Zhang,Jingbo Shang,Jishen Zhao,Yufei Ding

Existing API-based agentic systems for RTL code generation are fundamentally misaligned with industrial practice: they assume a golden testbench is available at generation time, rely on closed-source APIs incompatible with chip vendors' air-gapped security requirements, and cannot be trained on vendors' proprietary RTL codebases, leaving valuable internal data unused. Recent self-trained models address the deployment constraint but remain single-turn generators that overlook the critical role of verification in real industrial flows. To bridge these gaps, we present ChipMATE, the first self-trained multi-agent framework for RTL generation. Inspired by industrial practice where correctness emerges from cross-comparison between independently written RTL modules and reference models, ChipMATE pairs a Verilog agent with a Python reference-model agent that mutually verify each other's outputs without any golden oracle. We design a backtrack-based inference workflow to prevent error propagation across turns, and a two-stage training pipeline that first trains each agent individually to saturate its code-generation capability, then trains the team jointly to collaborate effectively. To support the training, we further build a hybrid data-generation framework that produces 64.4K high-quality reference model training samples. ChipMATE achieves 75.0\% and 80.1\% pass@1 on VerilogEval V2 with 4B and 9B base models, outperforming all existing self-trained models and even DeepSeek V4 with 1600B parameters. Our code and model weights are publicly available in https://github.com/zhongkaiyu/ChipMATE.

翻译：现有基于API的RTL代码生成智能体系统与工业实践存在根本性脱节：它们假设在生成时即存在黄金测试平台，依赖与芯片供应商气隙安全要求不兼容的闭源API，且无法在供应商专有RTL代码库上训练，导致宝贵的内部数据未被利用。近期自训练模型虽解决了部署约束问题，但仍属于单轮生成器，忽视了验证在实际工业流程中的关键作用。为弥合这些差距，我们提出ChipMATE——首个用于RTL生成的自训练多智能体框架。受工业实践中通过独立编写的RTL模块与参考模型交叉比对产生正确性的启发，ChipMATE将Verilog智能体与Python参考模型智能体配对，使两者在无任何黄金标准的情况下相互验证彼此输出。我们设计了基于回溯推理的工作流以防止错误跨轮传播，并构建了两阶段训练流水线：首先单独训练每个智能体以饱和其代码生成能力，随后联合训练团队以实现高效协作。为支撑训练，我们进一步构建了混合数据生成框架，产生了64.4K高质量参考模型训练样本。ChipMATE在采用4B和9B基础模型时，于VerilogEval V2上分别达到75.0%和80.1%的pass@1指标，超越了所有现有自训练模型，甚至包括拥有1600B参数的DeepSeek V4。我们的代码和模型权重已在https://github.com/zhongkaiyu/ChipMATE公开。