Large languages models (LLMs) trained on datasets of publicly available source code have established a new state-of-the-art in code completion. However, these models are mostly unaware of the code that already exists within a specific project, preventing the models from making good use of existing APIs. Instead, LLMs often invent, or "hallucinate", non-existent APIs or produce variants of already existing code. Although the API information is available to IDEs, the input size limit of LLMs prevents code completion techniques from including all relevant context into the prompt. This paper presents De-Hallucinator, an LLM-based code completion technique that grounds the predictions of a model through a novel combination of retrieving suitable API references and iteratively querying the model with increasingly suitable context information in the prompt. The approach exploits the observation that LLMs often predict code that resembles the desired completion, but that fails to correctly refer to already existing APIs. De-Hallucinator automatically identifies project-specific API references related to the code prefix and to the model's initial predictions and adds these references into the prompt. Our evaluation applies the approach to the task of predicting API usages in open-source Python projects. We show that De-Hallucinator consistently improves the predicted code across four state-of-the-art LLMs compared to querying the model only with the code before the cursor. In particular, the approach improves the edit distance of the predicted code by 23-51% and the recall of correctly predicted API usages by 24-61% relative to the baseline.
翻译:基于公开源代码数据集训练的大语言模型(LLMs)已确立代码补全任务的新最优水平。然而,这些模型大多不了解特定项目中已存在的代码,导致其无法有效利用现有API。相反,LLMs常会虚构或"幻觉"出不存在API,或生成已有代码的变体。尽管IDE可获取API信息,但LLMs的输入规模限制使得代码补全技术无法将所有相关上下文纳入提示词中。本文提出De-Hallucinator——一种基于LLM的代码补全技术,通过创新性组合检索适当API引用与在提示词中迭代注入更匹配的上下文信息,实现模型预测的接地。该方法利用一个关键发现:LLMs预测的代码常与所需补全结果相似,但未能正确引用已有API。De-Hallucinator能自动识别与代码前缀及模型初始预测相关的项目特定API引用,并将其纳入提示词。评估实验针对开源Python项目中的API使用预测任务展开,结果表明:与仅使用光标前代码查询模型的基线方法相比,De-Hallucinator在四种最先进LLMs上均能持续改善预测代码质量。具体而言,该方法使预测代码的编辑距离相对改进23-51%,正确预测API使用的召回率相对提升24-61%。