New AI accelerators with novel instruction set architectures (ISAs) often require developers to manually craft low-level kernels, a time-consuming and error-prone process that does not scale across hardware targets. This delays emerging hardware platforms from reaching the market. While prior LLM-based code generation has shown promise in mature GPU ecosystems, it remains unclear whether agentic LLM systems can quickly produce valid and efficient kernels for emerging hardware with new ISAs. We present KernelCraft: the first benchmark for evaluating an LLM agent's ability to generate and optimize low-level kernels for customized accelerators through a function-calling, feedback-driven workflow. We evaluate agent performance across three emerging accelerators on more than 20 machine-learning tasks, each with five diverse task configurations. Across four leading reasoning models, the strongest agents generate functionally correct kernels for unseen ISAs within a few refinement steps and produce optimized kernels that match or outperform compiler baselines. These results demonstrate KernelCraft's potential to accelerate the accelerator chip development cycle. KernelCraft is available at https://kernelcraft-cam.github.io/.
翻译:摘要:采用新型指令集架构(ISA)的AI加速器通常需要开发人员手动编写底层内核,这一过程耗时且易出错,且难以在不同硬件目标间扩展,从而延迟了新兴硬件平台的市场化进程。尽管基于LLM的代码生成在成熟的GPU生态系统中已展现出潜力,但目前尚不明确:具备自主能力的LLM系统能否针对配备新型ISA的新兴硬件,快速生成有效且高效的内核。我们提出KernelCraft——首个通过函数调用与反馈驱动工作流,评估LLM智能体生成与优化定制加速器底层内核能力的基准测试。我们在三种新兴加速器上,针对20余项机器学习任务(每项任务包含五种不同配置)评估了智能体表现。在四种领先的推理模型中,最强智能体仅需少量迭代即可为未知ISA生成功能正确的内核,并产出可媲美或超越编译器基线的优化内核。这些结果证明了KernelCraft在加速芯片开发周期方面的潜力。KernelCraft代码开源地址:https://kernelcraft-cam.github.io/