Byam: Fixing Breaking Dependency Updates with Large Language Models

Application Programming Interfaces (APIs) facilitate the integration of third-party dependencies within the code of client applications. However, changes to an API, such as deprecation, modification of parameter names or types, or complete replacement with a new API, can break existing client code. These changes are called breaking dependency updates; It is often tedious for API users to identify the cause of these breaks and update their code accordingly. In this paper, we explore the use of Large Language Models (LLMs) to automate client code updates in response to breaking dependency updates. We evaluate our approach on the BUMP dataset, a benchmark for breaking dependency updates in Java projects. Our approach leverages LLMs with advanced prompts, including information from the build process and from the breaking dependency analysis. We assess effectiveness at three granularity levels: at the build level, the file level, and the individual compilation error level. We experiment with five LLMs: Google Gemini-2.0 Flash, OpenAI GPT4o-mini, OpenAI o3-mini, Alibaba Qwen2.5-32b-instruct, and DeepSeek V3. Our results show that LLMs can automatically repair breaking updates. Among the considered models, OpenAI's o3-mini is the best, able to completely fix 27% of the builds when using prompts that include contextual information such as the buggy line, API differences, error messages, and step-by-step reasoning instructions. Also, it fixes 78% of the individual compilation errors. Overall, our findings demonstrate the potential for LLMs to fix compilation errors due to breaking dependency updates, supporting developers in their efforts to stay up-to-date with changes in their dependencies.

翻译：应用程序编程接口（API）促进了第三方依赖在客户端应用程序代码中的集成。然而，API的变更——例如弃用、参数名称或类型的修改，或完全被新API替代——可能破坏现有的客户端代码。这类变更被称为破坏性依赖更新；对于API使用者而言，定位这些破坏的原因并相应更新代码通常是一项繁琐的任务。本文探讨了利用大型语言模型（LLMs）自动化响应破坏性依赖更新的客户端代码更新方法。我们在BUMP数据集（一个针对Java项目中破坏性依赖更新的基准测试集）上评估了我们的方法。该方法利用LLMs并结合了来自构建过程及破坏性依赖分析信息的高级提示。我们在三个粒度级别上评估其有效性：构建级别、文件级别以及单个编译错误级别。我们实验了五种LLM模型：Google Gemini-2.0 Flash、OpenAI GPT4o-mini、OpenAI o3-mini、Alibaba Qwen2.5-32b-instruct以及DeepSeek V3。结果表明，LLMs能够自动修复破坏性更新。在所考虑的模型中，OpenAI的o3-mini表现最佳，当使用包含错误代码行、API差异、错误信息及分步推理指令等上下文信息的提示时，能够完全修复27%的构建问题。同时，它修复了78%的单个编译错误。总体而言，我们的发现证明了LLMs在修复由破坏性依赖更新引起的编译错误方面的潜力，有助于开发者及时跟进其依赖项的变化。