Code completion is essential in software development, helping developers by predicting code snippets based on context. Among completion tasks, Method Body Completion (MBC) is particularly challenging as it involves generating complete method bodies based on their signatures and context. This task becomes significantly harder in large repositories, where method bodies must integrate repositoryspecific elements such as custom APIs, inter-module dependencies, and project-specific conventions. In this paper, we introduce RAMBO, a novel RAG-based approach for repository-level MBC. Instead of retrieving similar method bodies, RAMBO identifies essential repository-specific elements, such as classes, methods, and variables/fields, and their relevant usages. By incorporating these elements and their relevant usages into the code generation process, RAMBO ensures more accurate and contextually relevant method bodies. Our experimental results with leading code LLMs across 40 Java projects show that RAMBO significantly outperformed the state-of-the-art repository-level MBC approaches, with the improvements of up to 46% in BLEU, 57% in CodeBLEU, 36% in Compilation Rate, and up to 3X in Exact Match. Notably, RAMBO surpassed RepoCoder Oracle method by up to 12% in Exact Match, setting a new benchmark for repository-level MBC.
翻译:代码补全在软件开发中至关重要,它通过基于上下文预测代码片段来辅助开发人员。在各类补全任务中,方法体补全(MBC)尤其具有挑战性,因为它需要根据方法签名和上下文生成完整的方法体。这一任务在大型代码仓库中变得尤为困难,因为生成的方法体必须整合仓库特定的元素,例如自定义API、模块间依赖关系和项目特定约定。本文提出RAMBO,一种基于检索增强生成(RAG)的新型仓库级MBC方法。与检索相似方法体不同,RAMBO识别关键的仓库特定元素(如类、方法、变量/字段)及其相关用法。通过将这些元素及其相关用法整合到代码生成过程中,RAMBO确保了生成的方法体更加准确且符合上下文。我们在40个Java项目上使用领先的代码大语言模型进行的实验结果表明,RAMBO显著优于当前最先进的仓库级MBC方法,在BLEU指标上提升高达46%,CodeBLEU提升57%,编译率提升36%,精确匹配率提升高达3倍。值得注意的是,RAMBO在精确匹配率上甚至超越了RepoCoder Oracle方法高达12%,为仓库级MBC树立了新的性能基准。