Large language models have made substantial progress in addressing diverse code-related tasks. However, their adoption is hindered by inconsistencies in generating output due to the lack of real-world, domain-specific information, such as for intra-repository API calls for unseen software projects. We introduce a novel technique to mitigate hallucinations by leveraging global and local contextual information within a code repository for API completion tasks. Our approach is tailored to refine code completion tasks, with a focus on optimizing local API completions. We examine relevant import statements during API completion to derive insights into local APIs, drawing from their method signatures. For API token completion, we analyze the inline variables and correlate them with the appropriate imported modules, thereby allowing our approach to rank the most contextually relevant suggestions from the available local APIs. Further, for conversational API completion, we gather APIs that are most relevant to the developer query with a retrieval-based search across the project. We employ our tool, LANCE, within the framework of our proposed benchmark, APIEval, encompassing two different programming languages. Our evaluation yields an average accuracy of 82.6% for API token completion and 76.9% for conversational API completion tasks. On average, LANCE surpasses Copilot by 143% and 142% for API token completion and conversational API completion, respectively. The implications of our findings are substantial for developers, suggesting that our lightweight context analysis can be applied to multilingual environments without language-specific training or fine-tuning, allowing for efficient implementation with minimal examples and effort.
翻译:大型语言模型在解决各类代码相关任务方面取得了显著进展,但由于缺乏真实世界的领域特定信息(如未知软件项目仓库内部的API调用),其输出一致性受到限制。我们提出了一种新技术,通过利用代码仓库内的全局和局部上下文信息来缓解API补全任务中的幻觉问题。该方法专为优化代码补全任务设计,重点优化局部API补全。我们分析API补全过程中相关的import语句,从其方法签名中提取局部API信息。对于API令牌补全,我们解析内联变量并将其与相应导入模块关联,从而对可用局部API中最具上下文相关性的建议进行排序。此外,对于对话式API补全,我们通过跨项目的检索式搜索收集与开发者查询最相关的API。我们在自主提出的基准测试APIEval框架内应用工具LANCE,该框架涵盖两种不同编程语言。实验结果显示,API令牌补全平均准确率达82.6%,对话式API补全达76.9%。在API令牌补全与对话式API补全任务中,LANCE分别平均超越Copilot 143%和142%。该发现对开发者具有重要实践意义,表明我们的轻量级上下文分析无需特定语言训练或微调,即可适用于多语言环境,以最少示例和投入实现高效部署。