Towards Automated Kernel Generation in the Era of LLMs

Yang Yu,Peiyu Zang,Chi Hsu Tsai,Haiming Wu,Yixin Shen,Jialing Zhang,Haoyu Wang,Zhiyou Xiao,Jingze Shi,Yuyu Luo,Wentao Zhang,Chunlei Men,Guang Liu,Yonghua Lin

from arxiv, 10 pages, 1 figure

The performance of modern AI systems is fundamentally constrained by the quality of their underlying kernels, which translate high-level algorithmic semantics into low-level hardware operations. Achieving near-optimal kernels requires expert-level understanding of hardware architectures and programming models, making kernel engineering a critical but notoriously time-consuming and non-scalable process. Recent advances in large language models (LLMs) and LLM-based agents have opened new possibilities for automating kernel generation and optimization. LLMs are well-suited to compress expert-level kernel knowledge that is difficult to formalize, while agentic systems further enable scalable optimization by casting kernel development as an iterative, feedback-driven loop. Rapid progress has been made in this area. However, the field remains fragmented, lacking a systematic perspective for LLM-driven kernel generation. This survey addresses this gap by providing a structured overview of existing approaches, spanning LLM-based approaches and agentic optimization workflows, and systematically compiling the datasets and benchmarks that underpin learning and evaluation in this domain. Moreover, key open challenges and future research directions are further outlined, aiming to establish a comprehensive reference for the next generation of automated kernel optimization. To keep track of this field, we maintain an open-source GitHub repository at https://github.com/flagos-ai/awesome-LLM-driven-kernel-generation.

翻译：现代人工智能系统的性能从根本上受限于其底层内核的质量，这些内核将高级算法语义转化为低级硬件操作。实现接近最优的内核需要对硬件架构和编程模型具备专家级理解，这使得内核工程成为一个关键但极其耗时且难以扩展的过程。近期大型语言模型（LLM）及基于LLM的智能体技术的进展，为自动化内核生成与优化开辟了新的可能性。LLM非常适合压缩那些难以形式化的专家级内核知识，而智能体系统通过将内核开发构建为一个迭代的、反馈驱动的循环，进一步实现了可扩展的优化。该领域已取得快速进展，但目前研究仍较为零散，缺乏对LLM驱动内核生成系统性视角的审视。本综述旨在填补这一空白，通过提供现有方法的结构化概览（涵盖基于LLM的方法与智能体优化工作流），并系统性地汇编支撑该领域学习与评估的数据集和基准测试，从而建立系统化的认知框架。此外，本文进一步概述了关键的开放挑战与未来研究方向，旨在为下一代自动化内核优化建立全面的参考。为持续追踪该领域发展，我们在 https://github.com/flagos-ai/awesome-LLM-driven-kernel-generation 维护了一个开源GitHub仓库。