Code completion has become an essential tool for daily software development. Existing evaluation benchmarks often employ static methods that do not fully capture the dynamic nature of real-world coding environments and face significant challenges, including limited context length, reliance on superficial evaluation metrics, and potential overfitting to training datasets. In this work, we introduce a novel framework for enhancing code completion in software development through the creation of a repository-level benchmark ExecRepoBench and the instruction corpora Repo-Instruct, aim at improving the functionality of open-source large language models (LLMs) in real-world coding scenarios that involve complex interdependencies across multiple files. ExecRepoBench includes 1.2K samples from active Python repositories. Plus, we present a multi-level grammar-based completion methodology conditioned on the abstract syntax tree to mask code fragments at various logical units (e.g. statements, expressions, and functions). Then, we fine-tune the open-source LLM with 7B parameters on Repo-Instruct to produce a strong code completion baseline model Qwen2.5-Coder-Instruct-C based on the open-source model. Qwen2.5-Coder-Instruct-C is rigorously evaluated against existing benchmarks, including MultiPL-E and ExecRepoBench, which consistently outperforms prior baselines across all programming languages. The deployment of \ourmethod{} can be used as a high-performance, local service for programming development\footnote{\url{https://execrepobench.github.io/}}.
翻译:代码补全已成为日常软件开发的重要工具。现有评估基准通常采用静态方法,未能充分捕捉真实世界编码环境的动态特性,并面临显著挑战,包括有限的上下文长度、对表面评估指标的依赖以及可能对训练数据集的过拟合。本研究引入一个新颖框架,通过创建仓库级基准ExecRepoBench与指令语料库Repo-Instruct,旨在提升开源大语言模型(LLMs)在涉及多文件间复杂依赖关系的真实世界编码场景中的功能。ExecRepoBench包含来自活跃Python仓库的1.2K个样本。此外,我们提出一种基于语法的多层级补全方法,该方法以抽象语法树为条件,在不同逻辑单元(如语句、表达式和函数)上对代码片段进行掩码。随后,我们在Repo-Instruct上对拥有70亿参数的开源LLM进行微调,基于开源模型构建了强大的代码补全基线模型Qwen2.5-Coder-Instruct-C。该模型在包括MultiPL-E和ExecRepoBench在内的现有基准上进行了严格评估,在所有编程语言中均持续超越先前基线。本方法的部署可作为编程开发的高性能本地服务\footnote{\url{https://execrepobench.github.io/}}。