ChipSeek: Optimizing Verilog Generation via EDA-Integrated Reinforcement Learning

Large Language Models have emerged as powerful tools for automating Register-Transfer Level (RTL) code generation, yet they face critical limitations: existing approaches typically fail to simultaneously optimize functional correctness and hardware efficiency metrics such as Power, Performance, and Area (PPA). Methods relying on supervised fine-tuning commonly produce functionally correct but suboptimal designs due to the lack of inherent mechanisms for learning hardware optimization principles. Conversely, external post-processing techniques aiming to refine PPA performance after generation often suffer from inefficiency and do not improve the LLMs' intrinsic capabilities. To overcome these challenges, we propose ChipSeek, a novel hierarchical reward based reinforcement learning framework designed to encourage LLMs to generate RTL code that is both functionally correct and optimized for PPA metrics. Our approach integrates direct feedback from EDA simulators and synthesis tools into a hierarchical reward mechanism, facilitating a nuanced understanding of hardware design trade-offs. Through Curriculum-Guided Dynamic Policy Optimization (CDPO), ChipSeek enhances the LLM's ability to generate high-quality, optimized RTL code. Evaluations on standard benchmarks demonstrate ChipSeek's superior performance, achieving state-of-the-art functional correctness and PPA performance. Furthermore, it excels in specific optimization tasks, consistently yielding highly efficient designs when individually targeting fine-grained optimization goals such as power, delay, and area. The artifact is open-source in https://github.com/rong-hash/chipseek.

翻译：大型语言模型已成为自动化寄存器传输级（RTL）代码生成的强大工具，但它们面临关键局限：现有方法通常无法同时优化功能正确性和功率、性能与面积（PPA）等硬件效率指标。依赖监督微调的方法通常能生成功能正确但次优的设计，原因是缺乏学习硬件优化原理的内在机制。相反，旨在生成后改善PPA性能的外部后处理技术往往效率低下，且无法提升LLM的固有性能。为克服这些挑战，我们提出ChipSeek，一种新颖的基于层次化奖励的强化学习框架，旨在鼓励LLM生成既功能正确又针对PPA指标优化的RTL代码。我们的方法将来自EDA模拟器和综合工具的直接反馈集成到层次化奖励机制中，促进对硬件设计权衡的细致理解。通过课程引导动态策略优化（CDPO），ChipSeek增强了LLM生成高质量、优化RTL代码的能力。在标准基准测试上的评估展示了ChipSeek的优越性能，实现了功能正确性和PPA性能的最先进水平。此外，它在特定优化任务中表现出色，当单独针对功率、延迟和面积等细粒度优化目标时，持续生成高效设计。该工件已在https://github.com/rong-hash/chipseek开源。