Improved Bug Localization with AI Agents Leveraging Hypothesis and Dynamic Cognition

Software bugs cost technology providers (e.g., AT&T) billions annually and cause developers to spend roughly 50% of their time on bug resolution. Traditional methods for bug localization often analyze the suspiciousness of code components (e.g., methods, documents) in isolation, overlooking their connections with other components in the codebase. Recent advances in Large Language Models (LLMs) and agentic AI techniques have shown strong potential for code understanding, but still lack causal reasoning during code exploration and struggle to manage growing context effectively, limiting their capability. In this paper, we present a novel agentic technique for bug localization -- CogniGent -- that overcomes the limitations above by leveraging multiple AI agents capable of causal reasoning, call-graph-based root cause analysis and context engineering. It emulates developers-inspired debugging practices (a.k.a., dynamic cognitive debugging) and conducts hypothesis testing to support bug localization. We evaluate CogniGent on a curated dataset of 591 bug reports using three widely adopted performance metrics and compare it against six established baselines from the literature. Experimental results show that our technique consistently outperformed existing traditional and LLM-based techniques, achieving MAP improvements of 23.33-38.57% at the document and method levels. Similar gains were observed in MRR, with increases of 25.14-53.74% at both granularity levels. Statistical significance tests also confirm the superiority of our technique. By addressing the reasoning, dependency, and context limitations, CogniGent advances the state of bug localization, bridging human-like cognition with agentic automation for improved performance.

翻译：软件缺陷每年给技术提供商（如AT&T）造成数十亿美元损失，并导致开发者将约50%的时间用于缺陷修复。传统缺陷定位方法通常孤立地分析代码组件（如方法、文档）的可疑度，忽视了它们与代码库中其他组件的关联。尽管大语言模型（LLMs）与代理式AI技术在代码理解方面展现出强大潜力，但在代码探索过程中仍缺乏因果推理能力，且难以有效管理日益增长的上下文，限制了其应用能力。本文提出一种新型代理式缺陷定位技术——CogniGent——该技术通过整合具备因果推理能力、基于调用图的根因分析与上下文工程的多AI代理，克服了上述局限。它模拟了开发者启发的调试实践（即动态认知调试），并通过假设检验来支持缺陷定位。我们在包含591份缺陷报告的精选数据集上评估CogniGent，采用三种广泛采用的性能指标，并与文献中六种成熟基线方法进行对比。实验结果表明，我们的技术在所有传统及基于LLM的技术中均表现更优，在文档和方法层级上实现了23.33-38.57%的平均精度均值（MAP）提升。在平均倒数排名（MRR）指标上同样观察到显著增益，两个粒度层级的提升幅度达25.14-53.74%。统计显著性检验进一步证实了本技术的优越性。通过解决推理、依赖关系与上下文管理的局限性，CogniGent推动了缺陷定位技术的发展，将类人认知与代理自动化相结合，实现了性能的全面提升。