Large Language Models (LLMs) have demonstrated impressive capabilities in code completion tasks, where they assist developers by predicting and generating new code in real-time. However, existing LLM-based code completion systems primarily rely on the immediate context of the file being edited, often missing valuable repository-level information, user behaviour and edit history that could improve suggestion accuracy. Additionally, challenges such as efficiently retrieving relevant code snippets from large repositories, incorporating user behavior, and balancing accuracy with low-latency requirements in production environments remain unresolved. In this paper, we propose ContextModule, a framework designed to enhance LLM-based code completion by retrieving and integrating three types of contextual information from the repository: user behavior-based code, similar code snippets, and critical symbol definitions. By capturing user interactions across files and leveraging repository-wide static analysis, ContextModule improves the relevance and precision of generated code. We implement performance optimizations, such as index caching, to ensure the system meets the latency constraints of real-world coding environments. Experimental results and industrial practise demonstrate that ContextModule significantly improves code completion accuracy and user acceptance rates.
翻译:大型语言模型(LLM)在代码补全任务中展现出卓越能力,能够通过实时预测和生成新代码来辅助开发者。然而,现有的基于LLM的代码补全系统主要依赖当前编辑文件的局部上下文,往往忽略了能够提升建议准确性的宝贵仓库级信息、用户行为与编辑历史。此外,如何从大型仓库中高效检索相关代码片段、融合用户行为,以及在生产环境中平衡准确性与低延迟需求等挑战仍未得到解决。本文提出ContextModule框架,旨在通过检索并整合仓库中的三类上下文信息——基于用户行为的代码、相似代码片段及关键符号定义——来增强基于LLM的代码补全。通过捕捉跨文件的用户交互并利用仓库范围的静态分析,ContextModule提升了生成代码的相关性与精确度。我们实施了索引缓存等性能优化措施,以确保系统满足实际编码环境的延迟约束。实验结果表明,ContextModule显著提高了代码补全的准确率与用户接受度。