Teaching Code LLMs to Use Autocompletion Tools in Repository-Level Code Generation

Recent code large language models (LLMs) have shown promising performance in generating standalone functions but face limitations in repository-level code generation due to their lack of awareness of repository-level dependencies (e.g., user-defined attributes), resulting in dependency errors such as undefined-variable and no-member errors. In this work, we introduce ToolGen, an approach that integrates autocompletion tools into the code LLM generation process to address these dependencies. ToolGen comprises two main phases: Data Augmentation and Model Fine-tuning (Offline), and Tool-integrated Code Generation (Online). During the offline phase, ToolGen augments functions within a given code corpus with a special mark token, indicating positions to trigger autocompletion tools. These augmented functions, along with their corresponding docstrings, are then used to fine-tune a selected code LLM. In the online phase, ToolGen iteratively generates functions by predicting tokens step-by-step using the fine-tuned LLM. Whenever a mark token is encountered, ToolGen invokes the autocompletion tool to suggest code completions and selects the most appropriate one. We conduct comprehensive experiments to evaluate ToolGen's effectiveness in repository-level code generation. To facilitate this evaluation, we create a benchmark comprising 680 real-world code repositories and introduce two new repository-level metrics: Dependency Coverage and Success Rate. The results demonstrate that ToolGen significantly improves dependency coverage by 15.2% to 45.8% and success rates by 10.9% to 42.2% across three distinct code LLMs, while maintaining competitive performance in widely-recognized similarity metrics. Furthermore, our generalizability evaluation confirms ToolGen's consistent performance when applied to diverse code LLMs, including various model architectures and scales.

翻译：近期，代码大语言模型在生成独立函数方面展现出良好性能，但由于缺乏对仓库级依赖关系（如用户定义属性）的感知，在仓库级代码生成中面临局限性，导致出现变量未定义、成员不存在等依赖错误。为此，我们提出ToolGen方法，通过将自动补全工具集成到代码大语言模型生成过程中来解决这些依赖问题。ToolGen包含两个主要阶段：数据增强与模型微调（离线阶段）和工具集成式代码生成（在线阶段）。离线阶段中，ToolGen在给定代码库的函数中插入特殊标记token，标识需触发自动补全工具的位置。这些增强后的函数及其对应文档字符串被用于微调所选代码大语言模型。在线阶段中，ToolGen通过微调模型逐步预测token，迭代生成函数。每当遇到标记token时，ToolGen调用自动补全工具生成代码候选项并选择最合适的补全结果。我们开展了全面实验评估ToolGen在仓库级代码生成中的有效性。为支持该评估，我们构建了包含680个真实代码仓库的基准数据集，并引入两个新的仓库级评价指标：依赖覆盖率和成功率。结果表明，在三种不同的代码大语言模型上，ToolGen将依赖覆盖率显著提升15.2%至45.8%，成功率提升10.9%至42.2%，同时保持广泛认可的相似度指标的竞争力。此外，泛化性评估证实ToolGen在多种代码大语言模型（包括不同架构和规模的模型）上均能保持稳定的性能表现。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/