RLCoder: Reinforcement Learning for Repository-Level Code Completion

Repository-level code completion aims to generate code for unfinished code snippets within the context of a specified repository. Existing approaches mainly rely on retrieval-augmented generation strategies due to limitations in input sequence length. However, traditional lexical-based retrieval methods like BM25 struggle to capture code semantics, while model-based retrieval methods face challenges due to the lack of labeled data for training. Therefore, we propose RLCoder, a novel reinforcement learning framework, which can enable the retriever to learn to retrieve useful content for code completion without the need for labeled data. Specifically, we iteratively evaluate the usefulness of retrieved content based on the perplexity of the target code when provided with the retrieved content as additional context, and provide feedback to update the retriever parameters. This iterative process enables the retriever to learn from its successes and failures, gradually improving its ability to retrieve relevant and high-quality content. Considering that not all situations require information beyond code files and not all retrieved context is helpful for generation, we also introduce a stop signal mechanism, allowing the retriever to decide when to retrieve and which candidates to retain autonomously. Extensive experimental results demonstrate that RLCoder consistently outperforms state-of-the-art methods on CrossCodeEval and RepoEval, achieving 12.2% EM improvement over previous methods. Moreover, experiments show that our framework can generalize across different programming languages and further improve previous methods like RepoCoder. We provide the code and data at https://github.com/DeepSoftwareAnalytics/RLCoder.

翻译：仓库级代码补全旨在根据指定仓库的上下文为未完成的代码片段生成代码。由于输入序列长度的限制，现有方法主要依赖于检索增强生成策略。然而，传统的基于词汇的检索方法（如BM25）难以捕捉代码语义，而基于模型的检索方法则因缺乏用于训练的标注数据而面临挑战。为此，我们提出RLCoder，一种新颖的强化学习框架，能够使检索器在无需标注数据的情况下学习检索对代码补全有用的内容。具体而言，我们基于目标代码在提供检索内容作为额外上下文时的困惑度，迭代评估检索内容的有用性，并提供反馈以更新检索器参数。这一迭代过程使检索器能够从其成功和失败中学习，逐步提升其检索相关且高质量内容的能力。考虑到并非所有情况都需要超出代码文件的信息，且并非所有检索到的上下文都有助于生成，我们还引入了停止信号机制，允许检索器自主决定何时检索以及保留哪些候选内容。大量实验结果表明，RLCoder在CrossCodeEval和RepoEval数据集上持续优于现有最先进方法，相比先前方法实现了12.2%的精确匹配提升。此外，实验表明我们的框架能够泛化至不同编程语言，并可进一步提升如RepoCoder等现有方法。相关代码和数据已发布于https://github.com/DeepSoftwareAnalytics/RLCoder。