CodePlan: Repository-level Coding using LLMs and Planning

Software engineering activities such as package migration, fixing errors reports from static analysis or testing, and adding type annotations or other specifications to a codebase, involve pervasively editing the entire repository of code. We formulate these activities as repository-level coding tasks. Recent tools like GitHub Copilot, which are powered by Large Language Models (LLMs), have succeeded in offering high-quality solutions to localized coding problems. Repository-level coding tasks are more involved and cannot be solved directly using LLMs, since code within a repository is inter-dependent and the entire repository may be too large to fit into the prompt. We frame repository-level coding as a planning problem and present a task-agnostic framework, called CodePlan to solve it. CodePlan synthesizes a multi-step chain of edits (plan), where each step results in a call to an LLM on a code location with context derived from the entire repository, previous code changes and task-specific instructions. CodePlan is based on a novel combination of an incremental dependency analysis, a change may-impact analysis and an adaptive planning algorithm. We evaluate the effectiveness of CodePlan on two repository-level tasks: package migration (C#) and temporal code edits (Python). Each task is evaluated on multiple code repositories, each of which requires inter-dependent changes to many files (between 2-97 files). Coding tasks of this level of complexity have not been automated using LLMs before. Our results show that CodePlan has better match with the ground truth compared to baselines. CodePlan is able to get 5/6 repositories to pass the validity checks (e.g., to build without errors and make correct code edits) whereas the baselines (without planning but with the same type of contextual information as CodePlan) cannot get any of the repositories to pass them.

翻译：软件工程活动，如包迁移、修复静态分析或测试中的错误报告、向代码库添加类型标注或其他规范，涉及对完整代码仓库的全局性编辑。我们将这些活动定义为仓库级代码编辑任务。诸如GitHub Copilot等由大语言模型（LLM）驱动的现代工具，已成功为局部代码问题提供了高质量解决方案。但仓库级编辑任务更为复杂，无法直接通过LLM解决，因为仓库内代码存在相互依赖，且整个代码库可能过于庞大而无法完整放入提示词中。本文将仓库级代码编辑重构为规划问题，并提出一个任务无关框架CodePlan。该框架合成多步编辑链（规划），每一步骤针对特定代码位置调用LLM，并融合来自整个代码仓库的上下文、先前代码变更及任务特定指令。CodePlan基于增量依赖分析、变更影响分析和自适应规划算法的创新组合。我们在两个仓库级任务上评估CodePlan的有效性：包迁移（C#语言）与时序代码编辑（Python语言）。每个任务在多个代码仓库上测试，每个仓库均涉及对2-97个文件的相互依赖修改。此前，如此复杂度的代码编辑任务尚未通过LLM实现自动化。结果表明，与基线方法相比，CodePlan与真实编辑结果的匹配度更高。针对5/6的测试仓库，CodePlan能通过有效性验证（如无错误构建且执行正确代码修改），而基线方法（采用与CodePlan相同类型上下文但无规划机制）则无法通过任何仓库的验证。