Large Language Models (LLMs) have been applied to various hardware design tasks, including Verilog code generation, EDA tool scripting, and RTL bug fixing. Despite this extensive exploration, LLMs are yet to be used for the task of post-synthesis metric reasoning and estimation of HDL designs. In this paper, we assess the ability of LLMs to reason about post-synthesis metrics of Verilog designs. We introduce MetRex, a large-scale dataset comprising 25,868 Verilog HDL designs and their corresponding post-synthesis metrics, namely area, delay, and static power. MetRex incorporates a Chain of Thought (CoT) template to enhance LLMs' reasoning about these metrics. Extensive experiments show that Supervised Fine-Tuning (SFT) boosts the LLM's reasoning capabilities on average by 37.0\%, 25.3\%, and 25.7\% on the area, delay, and static power, respectively. While SFT improves performance on our benchmark, it remains far from achieving optimal results, especially on complex problems. Comparing to state-of-the-art regression models, our approach delivers accurate post-synthesis predictions for 17.4\% more designs (within a 5\% error margin), in addition to offering a 1.7x speedup by eliminating the need for pre-processing. This work lays the groundwork for advancing LLM-based Verilog code metric reasoning.
翻译:大语言模型(LLMs)已被应用于多种硬件设计任务,包括Verilog代码生成、EDA工具脚本编写以及RTL错误修复。尽管已有广泛探索,LLMs尚未被用于硬件描述语言(HDL)设计在综合后指标的推理与估计任务。本文评估了LLMs对Verilog设计综合后指标的推理能力。我们提出了MetRex,一个包含25,868个Verilog HDL设计及其对应综合后指标(即面积、延迟和静态功耗)的大规模数据集。MetRex引入了思维链(CoT)模板以增强LLMs对这些指标的推理能力。大量实验表明,监督微调(SFT)将LLM在面积、延迟和静态功耗指标上的平均推理能力分别提升了37.0%、25.3%和25.7%。尽管SFT在我们的基准测试中提升了性能,但其结果仍远未达到最优水平,尤其在复杂问题上。与最先进的回归模型相比,我们的方法在5%误差范围内为额外17.4%的设计提供了准确的综合后预测,同时通过避免预处理需求实现了1.7倍的加速。本工作为推进基于LLM的Verilog代码指标推理奠定了基础。