De-Hallucinator: Iterative Grounding for LLM-Based Code Completion

Large languages models (LLMs) trained on datasets of publicly available source code have established a new state-of-the-art in code completion. However, these models are mostly unaware of the code that already exists within a specific project, preventing the models from making good use of existing APIs. Instead, LLMs often invent, or "hallucinate", non-existent APIs or produce variants of already existing code. Although the API information is available to IDEs, the input size limit of LLMs prevents code completion techniques from including all relevant context into the prompt. This paper presents De-Hallucinator, an LLM-based code completion technique that grounds the predictions of a model through a novel combination of retrieving suitable API references and iteratively querying the model with increasingly suitable context information in the prompt. The approach exploits the observation that LLMs often predict code that resembles the desired completion, but that fails to correctly refer to already existing APIs. De-Hallucinator automatically identifies project-specific API references related to the code prefix and to the model's initial predictions and adds these references into the prompt. Our evaluation applies the approach to the task of predicting API usages in open-source Python projects. We show that De-Hallucinator consistently improves the predicted code across four state-of-the-art LLMs compared to querying the model only with the code before the cursor. In particular, the approach improves the edit distance of the predicted code by 23-51% and the recall of correctly predicted API usages by 24-61% relative to the baseline.

翻译：在公开源代码数据集上训练的大语言模型（LLMs）已为代码补全任务建立了新的最优水平。然而，这些模型对特定项目中已有的代码了解不足，导致其无法有效利用现有API。相反，LLMs常会凭空生成或"幻觉"出不存在的API，或产生已有代码的变体。尽管IDE环境中可获取API信息，但LLMs的输入长度限制使代码补全技术无法将所有相关上下文纳入提示词。本文提出去幻觉器（De-Hallucinator），一种基于LLM的代码补全技术，通过结合检索合适API参考与迭代式查询模型两大创新手段，在提示词中逐步注入更相关的上下文信息，从而锚定模型预测结果。该方法基于如下观察：LLMs生成的代码常与预期补全结果相似，但无法正确引用已有API。去幻觉器能自动识别与代码前缀及模型初始预测相关的项目级API引用，并将其加入提示词。我们在开源Python项目中针对API使用预测任务进行了评估，结果显示：与仅使用光标前代码查询模型的基线方法相比，去幻觉器在四种最先进LLM上均能持续改进预测质量。具体而言，该方法将预测代码的编辑距离降低23%-51%，API使用正确预测召回率较基线提升24%-61%。