Unlike code generation, which involves creating code from scratch, code completion focuses on integrating new lines or blocks of code into an existing codebase. This process requires a deep understanding of the surrounding context, such as variable scope, object models, API calls, and database relations, to produce accurate results. These complex contextual dependencies make code completion a particularly challenging problem. Current models and approaches often fail to effectively incorporate such context, leading to inaccurate completions with low acceptance rates (around 30\%). For tasks like data transfer, which rely heavily on specific relationships and data structures, acceptance rates drop even further. This study introduces CCCI, a novel method for generating context-aware code completions specifically designed to address data transfer tasks. By integrating contextual information, such as database table relationships, object models, and library details into Large Language Models (LLMs), CCCI improves the accuracy of code completions. We evaluate CCCI using 289 Java snippets, extracted from over 819 operational scripts in an industrial setting. The results demonstrate that CCCI achieved a 49.1\% Build Pass rate and a 41.0\% CodeBLEU score, comparable to state-of-the-art methods that often struggle with complex task completion.
翻译:与从零开始生成代码的代码生成不同,代码补全侧重于将新的代码行或代码块集成到现有代码库中。这一过程需要对周围上下文(如变量作用域、对象模型、API调用和数据库关系)有深入理解,才能产生准确的结果。这些复杂的上下文依赖使得代码补全成为一个极具挑战性的问题。当前的模型和方法通常无法有效整合此类上下文,导致补全结果不准确且接受率较低(约30%)。对于像数据传输这类严重依赖特定关系和数据结构的任务,接受率甚至进一步下降。本研究提出了CCCI,一种专门为解决数据传输任务而设计的、用于生成上下文感知代码补全的新方法。通过将数据库表关系、对象模型和库详细信息等上下文信息整合到大型语言模型(LLMs)中,CCCI提高了代码补全的准确性。我们使用从工业环境中超过819个运行脚本中提取的289个Java代码片段对CCCI进行了评估。结果表明,CCCI实现了49.1%的构建通过率和41.0%的CodeBLEU分数,其性能与那些在处理复杂任务补全时常常遇到困难的先进方法相当。