Towards Automated Kernel Generation in the Era of LLMs

Yang Yu,Peiyu Zang,Chi Hsu Tsai,Haiming Wu,Yixin Shen,Jialing Zhang,Haoyu Wang,Zhiyou Xiao,Jingze Shi,Yuyu Luo,Wentao Zhang,Chunlei Men,Guang Liu,Yonghua Lin

from arxiv, 10 pages, 1 figure

The performance of modern AI systems is fundamentally constrained by the quality of their underlying kernels, which translate high-level algorithmic semantics into low-level hardware operations. Achieving near-optimal kernels requires expert-level understanding of hardware architectures and programming models, making kernel engineering a critical but notoriously time-consuming and non-scalable process. Recent advances in large language models (LLMs) and LLM-based agents have opened new possibilities for automating kernel generation and optimization. LLMs are well-suited to compress expert-level kernel knowledge that is difficult to formalize, while agentic systems further enable scalable optimization by casting kernel development as an iterative, feedback-driven loop. Rapid progress has been made in this area. However, the field remains fragmented, lacking a systematic perspective for LLM-driven kernel generation. This survey addresses this gap by providing a structured overview of existing approaches, spanning LLM-based approaches and agentic optimization workflows, and systematically compiling the datasets and benchmarks that underpin learning and evaluation in this domain. Moreover, key open challenges and future research directions are further outlined, aiming to establish a comprehensive reference for the next generation of automated kernel optimization. To keep track of this field, we maintain an open-source GitHub repository at https://github.com/flagos-ai/awesome-LLM-driven-kernel-generation.

翻译：现代人工智能系统的性能从根本上受限于其底层内核的质量，这些内核将高级算法语义转化为低级硬件操作。实现接近最优的内核需要硬件架构和编程模型方面的专家级理解，这使得内核工程成为一个关键但极其耗时且难以规模化扩展的过程。大语言模型（LLMs）及基于LLM的智能体（agent）的最新进展为自动化内核生成与优化开辟了新的可能性。LLMs非常适合压缩那些难以形式化的专家级内核知识，而智能体系统通过将内核开发构建为一个迭代的、反馈驱动的循环，进一步实现了可扩展的优化。该领域已取得快速进展。然而，该领域目前仍较为零散，缺乏针对LLM驱动内核生成的系统性视角。本综述旨在填补这一空白，通过提供一个涵盖基于LLM的方法和智能体优化工作流的现有方法的系统性概览，并系统性地汇编了支撑该领域学习与评估的数据集和基准。此外，本文进一步概述了关键的开放挑战和未来的研究方向，旨在为下一代自动化内核优化建立一个全面的参考。为了追踪该领域的发展，我们在 https://github.com/flagos-ai/awesome-LLM-driven-kernel-generation 维护了一个开源GitHub仓库。