KEN: Kernel Extensions using Natural Language

The ability to modify and extend an operating system is an important feature for improving a system's security, reliability, and performance. The extended Berkeley Packet Filters (eBPF) ecosystem has emerged as the standard mechanism for extending the Linux kernel and has recently been ported to Windows. eBPF programs inject new logic into the kernel that the system will execute before or after existing logic. While the eBPF ecosystem provides a flexible mechanism for kernel extension, it is difficult for developers to write eBPF programs today. An eBPF developer must have deep knowledge of the internals of the operating system to determine where to place logic and cope with programming limitations on the control flow and data accesses of their eBPF program enforced by the eBPF verifier. This paper presents KEN, an alternative framework that alleviates the difficulty of writing an eBPF program by allowing Kernel Extensions to be written in Natural language. KEN uses recent advances in large language models (LLMs) to synthesize an eBPF program given a user's English language prompt. To ensure that LLM's output is semantically equivalent to the user's prompt, KEN employs a combination of LLM-empowered program comprehension, symbolic execution, and a series of feedback loops. KEN's key novelty is the combination of these techniques. In particular, the system uses symbolic execution in a novel structure that allows it to combine the results of program synthesis and program comprehension and build on the recent success that LLMs have shown for each of these tasks individually. To evaluate KEN, we developed a new corpus of natural language prompts for eBPF programs. We show that KEN produces correct eBPF programs on 80% which is an improvement of a factor of 2.67 compared to an LLM-empowered program synthesis baseline.

翻译：修改和扩展操作系统是提升系统安全性、可靠性和性能的重要能力。扩展型柏克莱封包过滤器（eBPF）生态系统已成为扩展Linux内核的标准机制，并已移植到Windows系统。eBPF程序向内核注入新逻辑，系统会在现有逻辑之前或之后执行这些逻辑。尽管eBPF生态系统为内核扩展提供了灵活的机制，但目前开发者编写eBPF程序仍面临困难。eBPF开发者必须深入了解操作系统内部机制，以确定逻辑放置位置，同时需应对eBPF验证器强制施加的程序控制流和数据访问限制。本文提出KEN框架，通过允许用户使用自然语言编写内核扩展，缓解了eBPF编程的难度。KEN利用大语言模型（LLM）的最新进展，根据用户的英文提示合成eBPF程序。为确保LLM输出与用户提示语义等价，KEN结合了LLM赋能的程序理解、符号执行及一系列反馈循环。KEN的核心创新在于对这些技术的整合。具体而言，系统采用新型结构的符号执行方法，将程序合成与程序理解的结果相结合，并基于LLM在这两项任务中各自展现的成果进行构建。为评估KEN，我们开发了新的eBPF程序自然语言提示语料库。实验表明，KEN在80%的案例中生成了正确的eBPF程序，相较LLM赋能的程序合成基线方案，性能提升了2.67倍。