Edge applications increasingly demand custom hardware, yet Field-Programmable Gate Array (FPGA) design requires expertise that domain engineers lack. Large Language Models (LLMs) promise to bridge this gap through zero-knowledge hardware programming, where users describe circuits in natural language and an LLM compiles them to a hardware intermediate representation (IR) targeting silicon. Modeling this flow as a cascade of binary filters, this work demonstrates that IR choice, not model choice, is the dominant factor governing end-to-end success, a phenomenon termed the representation bottleneck. An evaluation of three frontier LLMs across six IRs spanning Verilog, VHDL, Chisel, Bluespec, PyMTL3, and HLS C on 202 tasks through a pipeline of compilation, simulation, FPGA synthesis on a Lattice iCE40UP5K, and LLM-based repair shows that simulation pass rates range from 3% to 88% across IRs but typically vary less than 1.25x across models within any single IR. On the resource-constrained iCE40, LLM designs achieve a higher conditional FPGA pass rate than reference solutions, 86.5% vs. 68.7%, not because they are better but because a simplicity bias makes them small enough to fit. The analysis reveals an accessibility-competence paradox: the most user-friendly IRs yield the worst LLM performance, suggesting that optimal IR selection will evolve as LLM capabilities grow.
翻译:边缘应用日益需要定制硬件,然而现场可编程门阵列(FPGA)设计所需的专业知识是领域工程师所缺乏的。大型语言模型(LLMs)有望通过零知识硬件编程来弥合这一差距——用户用自然语言描述电路,LLM将其编译为面向硅的硬件中间表示(IR)。本研究将这一流程建模为二进制过滤器的级联,证明决定端到端成功的主导因素是IR的选择而非模型的选择,这一现象被称为表示瓶颈。通过评估三种前沿LLM在六种IR(涵盖Verilog、VHDL、Chisel、Bluespec、PyMTL3和HLS C)上执行202项任务的表现,经过编译、仿真、在Lattice iCE40UP5K上进行FPGA综合以及基于LLM的修复这一流水线处理,结果显示:不同IR的仿真通过率从3%到88%不等,但在单一IR内,不同模型的通过率差异通常小于1.25倍。在资源受限的iCE40上,LLM设计实现的条件FPGA通过率高于参考解决方案(86.5%对68.7%),这并非因为设计更优,而是因为简洁性偏差使其尺寸足够小以适应芯片。分析揭示了一个可达性-能力悖论:最易于使用的IR反而导致最差的LLM性能,这表明随着LLM能力的提升,最优IR的选择将不断演进。