Pretrained code language models have enabled great progress towards program synthesis. However, common approaches only consider in-file local context and thus miss information and constraints imposed by other parts of the codebase and its external dependencies. Existing code completion benchmarks also lack such context. To resolve these restrictions we curate a new dataset of permissively licensed Python packages that includes full projects and their dependencies and provide tools to extract non-local information with the help of program analyzers. We then focus on the task of function call argument completion which requires predicting the arguments to function calls. We show that existing code completion models do not yield good results on our completion task. To better solve this task, we query a program analyzer for information relevant to a given function call, and consider ways to provide the analyzer results to different code completion models during inference and training. Our experiments show that providing access to the function implementation and function usages greatly improves the argument completion performance. Our ablation study provides further insights on how different types of information available from the program analyzer and different ways of incorporating the information affect the model performance.
翻译:预训练的代码语言模型在程序合成方面取得了显著进展。然而,常见方法仅考虑文件内的局部上下文,因此遗漏了代码库其他部分及其外部依赖所施加的信息和约束。现有的代码补全基准也缺乏此类上下文。为解决这些限制,我们整理了一个新的数据集,包含采用宽松许可证的 Python 包,涵盖完整项目及其依赖,并提供利用程序分析器提取非局部信息的工具。随后,我们聚焦于函数调用参数补全任务,该任务需要预测函数调用的参数。我们证明,现有的代码补全模型在此补全任务上表现不佳。为了更好地解决此任务,我们查询程序分析器以获取与给定函数调用相关的信息,并考虑在推理和训练期间将分析器结果提供给不同代码补全模型的方法。我们的实验表明,提供对函数实现和函数用法的访问能显著提升参数补全性能。我们的消融研究进一步揭示了程序分析器提供的不同类型信息以及不同信息整合方式对模型性能的影响。