Code intelligence is an emerging domain in software engineering, aiming to improve the effectiveness and efficiency of various code-related tasks. Recent research suggests that incorporating contextual information beyond the basic original task inputs (i.e., source code) can substantially enhance model performance. Such contextual signals may be obtained directly or indirectly from sources such as API documentation or intermediate representations like abstract syntax trees can significantly improve the effectiveness of code intelligence. Despite growing academic interest, there is a lack of systematic analysis of context in code intelligence. To address this gap, we conduct an extensive literature review of 146 relevant studies published between September 2007 and August 2024. Our investigation yields four main contributions. (1) A quantitative analysis of the research landscape, including publication trends, venues, and the explored domains; (2) A novel taxonomy of context types used in code intelligence; (3) A task-oriented analysis investigating context integration strategies across diverse code intelligence tasks; (4) A critical evaluation of evaluation methodologies for context-aware methods. Based on these findings, we identify fundamental challenges in context utilization in current code intelligence systems and propose a research roadmap that outlines key opportunities for future research.
翻译:代码智能是软件工程中的一个新兴领域,旨在提高各类代码相关任务的有效性和效率。近期研究表明,在基础原始任务输入(即源代码)之外纳入上下文信息,可以显著提升模型性能。此类上下文信号可直接或间接地从诸如API文档等来源获取,或通过抽象语法树等中间表示获得,从而能显著改善代码智能的效果。尽管学术界兴趣日益增长,但目前缺乏对代码智能中上下文的系统性分析。为填补这一空白,我们对2007年9月至2024年8月期间发表的146项相关研究进行了广泛的文献综述。我们的调查产生了四项主要贡献:(1) 对研究格局的定量分析,包括发表趋势、发表场所以及探索的领域;(2) 一种用于代码智能的上下文类型的新颖分类法;(3) 一项面向任务的分析,研究跨不同代码智能任务的上下文整合策略;(4) 对上下文感知方法评估方法论的关键性评估。基于这些发现,我们指出了当前代码智能系统中上下文利用的根本性挑战,并提出了一个研究路线图,概述了未来研究的关键机遇。