The Linux kernel is a critical system, serving as the foundation for numerous systems. Bugs in the Linux kernel can cause serious consequences, affecting billions of users. Fault localization (FL), which aims at identifying the buggy code elements in software, plays an essential role in software quality assurance. While recent LLM agents have achieved promising accuracy in FL on recent benchmarks like SWE-bench, it remains unclear how well these methods perform in the Linux kernel, where FL is much more challenging due to the large-scale code base, limited observability, and diverse impact factors. In this paper, we introduce LinuxFLBench, a FL benchmark constructed from real-world Linux kernel bugs. We conduct an empirical study to assess the performance of state-of-the-art LLM agents on the Linux kernel. Our initial results reveal that existing agents struggle with this task, achieving a best top-1 accuracy of only 41.6% at file level. To address this challenge, we propose LinuxFL$^+$, an enhancement framework designed to improve FL effectiveness of LLM agents for the Linux kernel. LinuxFL$^+$ substantially improves the FL accuracy of all studied agents (e.g., 7.2% - 11.2% accuracy increase) with minimal costs.
翻译:Linux内核是一个关键系统,构成众多系统的基础。Linux内核中的错误可能导致严重后果,影响数十亿用户。故障定位(FL)旨在识别软件中的错误代码元素,在软件质量保证中扮演着至关重要的角色。尽管最近的LLM代理在SWE-bench等最新基准测试中展现出令人瞩目的FL精度,但这些方法在Linux内核上的表现仍不清楚——由于代码库规模庞大、可观测性有限以及影响因素的多样性,Linux内核中的FL更具挑战性。本文引入了LinuxFLBench,一个基于真实Linux内核错误构建的FL基准测试。我们通过实证研究评估了最先进的LLM代理在Linux内核上的性能。初步结果表明,现有代理难以完成此任务,在文件级别最佳Top-1精度仅为41.6%。为应对这一挑战,我们提出LinuxFL$^+$,一种专为提升LLM代理在Linux内核中FL有效性而设计的增强框架。LinuxFL$^+$显著提高了所有研究代理的FL精度(例如,精度提升7.2%-11.2%),且成本极低。