Large Language Models (LLMs) have achieved remarkable success in code completion, as evidenced by their essential roles in developing code assistant services such as Copilot. Being trained on in-file contexts, current LLMs are quite effective in completing code for single source files. However, it is challenging for them to conduct repository-level code completion for large software projects that require cross-file information. Existing research on LLM-based repository-level code completion identifies and integrates cross-file contexts, but it suffers from low accuracy and limited context length of LLMs. In this paper, we argue that Integrated Development Environments (IDEs) can provide direct, accurate and real-time cross-file information for repository-level code completion. We propose IDECoder, a practical framework that leverages IDE native static contexts for cross-context construction and diagnosis results for self-refinement. IDECoder utilizes the rich cross-context information available in IDEs to enhance the capabilities of LLMs of repository-level code completion. We conducted preliminary experiments to validate the performance of IDECoder and observed that this synergy represents a promising trend for future exploration.
翻译:大型语言模型(LLM)在代码补全方面取得了显著成功,其在Copilot等代码辅助服务开发中的关键作用已充分证明了这一点。由于基于文件内上下文进行训练,当前LLM在完成单个源文件的代码补全方面相当有效。然而,对于需要跨文件信息的大型软件项目,它们难以实现仓库级别的代码补全。现有基于LLM的仓库级别代码补全研究能够识别并整合跨文件上下文,但存在准确率低且受LLM上下文长度限制的问题。本文提出,集成开发环境(IDE)能够为仓库级别代码补全提供直接、准确且实时的跨文件信息。我们提出IDECoder这一实用框架,该框架利用IDE原生静态上下文进行跨上下文构建,并利用诊断结果进行自我优化。IDECoder借助IDE中丰富的跨上下文信息,增强LLM在仓库级别代码补全中的能力。我们通过初步实验验证了IDECoder的性能,并观察到这种协同作用代表了未来探索的一个有前景的方向。