Modern Java projects increasingly adopt static analysis tools that prevent null-pointer exceptions by treating nullness as a type property. However, integrating such tools into large, existing codebases remains a significant challenge. While annotation inference can eliminate many errors automatically, a subset of residual errors -- typically a mix of real bugs and false positives -- often persist and can only be resolved via code changes. Manually addressing these errors is tedious and error-prone. Large language models (LLMs) offer a promising path toward automating these repairs, but naively-prompted LLMs often generate incorrect, contextually-inappropriate edits. We present NullRepair, a system that integrates LLMs into a structured workflow for resolving the errors from a nullability checker. NullRepair's decision process follows a flowchart derived from manual analysis of 200 real-world errors. It leverages static analysis to identify safe and unsafe usage regions of symbols, using error-free usage examples to contextualize model prompts. Patches are generated through an iterative interaction with the LLM that incorporates project-wide context and decision logic. Our evaluation on 12 real-world Java projects shows that NullRepair resolves 63% of the 1,119 nullability errors that remain after applying a state-of-the-art annotation inference technique. Unlike two baselines (single-shot prompt and mini-SWE-agent), NullRepair also largely preserves program semantics, with all unit tests passing in 10/12 projects after applying every edit proposed by NullRepair, and 98% or more tests passing in the remaining two projects.
翻译:现代Java项目日益采用静态分析工具,通过将可空性视为类型属性来防止空指针异常。然而,将此类工具集成到大型现有代码库中仍然是一项重大挑战。虽然注解推断可以自动消除许多错误,但残留错误的一个子集——通常是实际缺陷和误报的混合——往往持续存在,只能通过代码更改来解决。手动处理这些错误既繁琐又容易出错。大型语言模型为实现这些修复的自动化提供了一条有前景的路径,但简单提示的LLM经常生成不正确、上下文不恰当的编辑。我们提出了NullRepair系统,该系统将LLM集成到结构化工作流中,用于解决可空性检查器产生的错误。NullRepair的决策流程遵循通过手动分析200个实际错误得出的流程图。它利用静态分析来识别符号的安全和不安全使用区域,并使用无错误的使用示例来为模型提示提供上下文。补丁通过与LLM的迭代交互生成,该交互结合了项目范围的上下文和决策逻辑。我们在12个实际Java项目上的评估表明,在应用最先进的注解推断技术后,NullRepair解决了1,119个可空性错误中的63%。与两个基线方法(单次提示和mini-SWE-agent)不同,NullRepair还在很大程度上保留了程序语义:在应用NullRepair提出的所有编辑后,10/12项目的所有单元测试均通过,其余两个项目中98%或更多的测试通过。