Large language models (LLMs) have shown promising capabilities in hardware description language (HDL) generation. However, existing approaches often rely on free-form natural language descriptions that are often ambiguous, redundant, and unstructured, which poses significant challenges for downstream Verilog code generation. We treat hardware code generation as a complex transformation from an open-ended natural language space to a domain-specific, highly constrained target space. To bridge this gap, we introduce Core Refined Understanding eXpression (CRUX), a structured intermediate space that captures the essential semantics of user intent while organizing the expression for precise Verilog code generation. We further design a two-stage training framework, comprising Joint Expression Modeling and Dual-Space Optimization, to enhance the quality of both CRUX and Verilog code. Experiments across multiple Verilog generation benchmarks demonstrate that our model, CRUX-V, achieves state-of-the-art performance among general models, particularly under challenging design tasks. Furthermore, the CRUX space proves transferable and beneficial when used as input prompts for other code models, highlighting its effectiveness in narrowing the gap between free-form natural language descriptions and precise Verilog generation.
翻译:大语言模型(LLMs)在硬件描述语言(HDL)生成方面展现出有前景的能力。然而,现有方法通常依赖于自由形式的自然语言描述,这些描述往往存在歧义、冗余且缺乏结构,这为下游Verilog代码生成带来了重大挑战。我们将硬件代码生成视为从开放的自然语言空间到领域特定、高度受限的目标空间的复杂转换。为了弥合这一差距,我们引入了核心精化理解表达式(CRUX),这是一个结构化的中间空间,它既能捕捉用户意图的核心语义,又能组织表达式以生成精确的Verilog代码。我们进一步设计了一个两阶段训练框架,包含联合表达式建模和双空间优化,以提升CRUX和Verilog代码的质量。在多个Verilog生成基准测试上的实验表明,我们的模型CRUX-V在通用模型中实现了最先进的性能,尤其是在具有挑战性的设计任务下。此外,CRUX空间被证明是可迁移且有益的,当用作其他代码模型的输入提示时,突显了其在缩小自由形式自然语言描述与精确Verilog生成之间差距方面的有效性。