From Context to Intent: Reasoning-Guided Function-Level Code Completion

The growing capabilities of Large Language Models (LLMs) have led to their widespread adoption for function completion within code repositories. Recent studies on such tasks show promising results when explicit instructions, often in the form of docstrings, are available to guide the completion. However, in real-world scenarios, clear docstrings are frequently absent. Under such conditions, LLMs typically fail to produce accurate completions. To enable more automated and accurate function completion in such settings, we aim to enable LLMs to accurately infer the developer's intent prior to code completion. Our key insight is that the preceding code, namely the code context before the function to be completed, often contains valuable cues that help the model understand the intended functionality. However, inferring intent from such implicit context is non-trivial and constitutes a core challenge in function-level code completion. To tackle this challenge, inspired by how humans interpret context, we propose a reasoning-based prompting framework that guides LLMs to utilize these contextual cues to infer intent step by step. To incentivize LLMs to reason through the preceding code and infer intent, we further curate a dataset of 40k examples, each annotated with intermediate reasoning traces and corresponding docstrings. Extensive experiments on DevEval and ComplexCodeEval demonstrate consistent performance improvements across multiple models, achieving over 25% relative gains in pass@1 for both DeepSeekCoder and CodeLLaMA families. Building upon our framework, we further develop an intent-interactive platform that supports lightweight human feedback. This platform allows developers to select from a set of candidate intentions or edit the intent to better guide the model. Our experiments show that this interactive approach leads to further performance improvements.

翻译：大型语言模型（LLMs）能力的不断增强使其在代码仓库中广泛用于函数补全。近期针对此类任务的研究表明，当存在明确的指令（通常以文档字符串形式）指导补全时，能获得令人满意的结果。然而在实际场景中，清晰的文档字符串往往缺失。在此类条件下，LLMs通常无法生成准确的补全。为在缺乏文档字符串的环境中实现更自动化和准确的函数补全，我们旨在使LLMs在代码补全前准确推断开发者的意图。我们的核心观点是：待补全函数之前的代码（即代码上下文）通常包含有价值线索，有助于模型理解预期功能。但从这种隐式上下文中推断意图并非易事，这构成了函数级代码补全的核心挑战。为应对该挑战，受人类理解上下文方式的启发，我们提出了一种基于推理的提示框架，引导LLMs逐步利用这些上下文线索推断意图。为激励LLMs通过前置代码进行推理并推断意图，我们进一步构建了包含4万个示例的数据集，每个示例均标注了中间推理轨迹及对应文档字符串。在DevEval和ComplexCodeEval上的大量实验表明，多个模型在性能上取得一致性提升，其中DeepSeekCoder和CodeLLaMA系列模型的pass@1指标相对提升超过25%。基于该框架，我们进一步开发了支持轻量级人类反馈的意图交互平台。该平台允许开发者从一组候选意图中选择或编辑意图以更好地引导模型。实验表明，这种交互式方法可带来进一步的性能提升。