Bugs in operating system kernels can affect billions of devices and users all over the world. As a result, a large body of research has been focused on kernel fuzzing, i.e., automatically generating syscall (system call) sequences to detect potential kernel bugs or vulnerabilities. Syzkaller, one of the most widely studied kernel fuzzers, aims to generate valid syscall sequences based on predefined specifications written in syzlang, a domain-specific language for defining syscalls, their arguments, and the relationships between them. While there has been existing work trying to automate Syzkaller specification generation, this still remains largely manual work and a large number of important syscalls are still uncovered. In this paper, we propose KernelGPT, the first approach to automatically inferring Syzkaller specifications via Large Language Models (LLMs) for enhanced kernel fuzzing. Our basic insight is that LLMs have seen massive kernel code, documentation, and use cases during pre-training, and thus can automatically distill the necessary information for making valid syscalls. More specifically, KernelGPT leverages an iterative approach to automatically infer all the necessary specification components, and further leverages the validation feedback to repair/refine the initial specifications. Our preliminary results demonstrate that KernelGPT can help Syzkaller achieve higher coverage and find multiple previously unknown bugs. Moreover, we also received a request from the Syzkaller team to upstream specifications inferred by KernelGPT.
翻译:操作系统内核中的漏洞可能影响全球数十亿台设备与用户。因此,大量研究聚焦于内核模糊测试——即自动生成系统调用序列以检测潜在的内核缺陷或安全漏洞。作为最广泛研究的内核模糊测试工具之一,Syzkaller旨在基于预定义的规范(通过领域特定语言syzlang编写,用于定义系统调用、其参数及相互关系)生成有效的系统调用序列。尽管已有工作尝试自动化Syzkaller规范的生成,但当前仍主要依赖人工操作,大量关键系统调用尚未被覆盖。本文提出KernelGPT——首个通过大语言模型自动推断Syzkaller规范以增强内核模糊测试的方法。其核心思想在于:大语言模型在预训练阶段已接触海量内核代码、文档及用例,因此能自动提炼构建有效系统调用所需的必要信息。具体而言,KernelGPT采用迭代方法自动推断所有必要的规范组件,并进一步利用验证反馈修复/优化初始规范。初步实验结果表明,KernelGPT能帮助Syzkaller实现更高覆盖率,并发现多个先前未知的漏洞。此外,Syzkaller团队已请求将KernelGPT推断的规范集成至上游版本。