Rapid advances in language models (LMs) have created new opportunities for automated code generation while complicating trade-offs between model characteristics and prompt design choices. In this work, we provide an empirical map of recent trends in LMs for Verilog code generation, focusing on interactions among model reasoning, specialization, and prompt engineering strategies. We evaluate a diverse set of small and large LMs, including general-purpose, reasoning, and domain-specific variants. Our experiments use a controlled factorial design spanning benchmark prompts, structured outputs, prompt rewriting, chain-of-thought reasoning, in-context learning, and evolutionary prompt optimization via Genetic-Pareto. Across two Verilog benchmarks, we identify patterns in how model classes respond to structured prompts and optimization, and we document which trends generalize across LMs and benchmarks versus those that are specific to particular model-prompt combinations.
翻译:语言模型的快速进步为自动化代码生成创造了新机遇,同时也加剧了模型特性与提示设计选择之间的权衡复杂性。本研究通过实证方法绘制了近期Verilog代码生成领域语言模型的发展趋势,重点关注模型推理能力、专业化程度与提示工程策略之间的交互作用。我们评估了涵盖通用型、推理增强型及领域专用型等多样化的小型与大型语言模型。实验采用受控析因设计,系统包含了基准提示、结构化输出、提示重写、思维链推理、上下文学习以及基于遗传-帕累托的演化提示优化方法。基于两个Verilog基准测试,我们识别出不同模型类别对结构化提示与优化策略的响应模式,并系统记录了跨模型与跨基准的普适性趋势及特定于模型-提示组合的特异性现象。