Large language models have made substantial progress in addressing diverse code-related tasks. However, their adoption is hindered by inconsistencies in generating output due to the lack of real-world, domain-specific information, such as for intra-repository API calls for unseen software projects. We introduce a novel technique to mitigate hallucinations by leveraging global and local contextual information within a code repository for API completion tasks. Our approach is tailored to refine code completion tasks, with a focus on optimizing local API completions. We examine relevant import statements during API completion to derive insights into local APIs, drawing from their method signatures. For API token completion, we analyze the inline variables and correlate them with the appropriate imported modules, thereby allowing our approach to rank the most contextually relevant suggestions from the available local APIs. Further, for conversational API completion, we gather APIs that are most relevant to the developer query with a retrieval-based search across the project. We employ our tool, LANCE, within the framework of our proposed benchmark, APIEval, encompassing two different programming languages. Our evaluation yields an average accuracy of 82.6% for API token completion and 76.9% for conversational API completion tasks. On average, LANCE surpasses Copilot by 143% and 142% for API token completion and conversational API completion, respectively. The implications of our findings are substantial for developers, suggesting that our lightweight context analysis can be applied to multilingual environments without language-specific training or fine-tuning, allowing for efficient implementation with minimal examples and effort.
翻译:大语言模型在解决各类代码相关任务方面取得了显著进展。然而,由于缺乏真实世界中的领域特定信息(例如未知软件项目中仓库内部API调用),其生成的输出存在不一致性,这阻碍了模型的广泛应用。我们提出了一种新技术,通过利用代码仓库中的全局和局部上下文信息来缓解API补全任务中的幻觉问题。我们的方法专为优化代码补全任务而设计,重点关注局部API补全的优化。在API补全过程中,我们分析相关的导入语句,从它们的方法签名中提取局部API的洞察。对于API令牌补全,我们分析内联变量并将其与相应的导入模块关联,从而使我们的方法能够从可用的局部API中排序出最符合上下文的建议。此外,对于对话式API补全,我们通过跨项目的基于检索的搜索来收集与开发者查询最相关的API。我们在所提出的基准测试框架APIEval中应用了工具LANCE,该框架涵盖两种不同编程语言。实验评估显示,LANCE在API令牌补全任务上的平均准确率为82.6%,在对话式API补全任务上为76.9%。平均而言,LANCE在API令牌补全和对话式API补全任务上分别超过Copilot 143%和142%。我们的研究结果对开发者具有重要意义,表明轻量级上下文分析无需特定语言训练或微调即可应用于多语言环境,从而以最少样本和努力实现高效部署。