Deep learning source code models have been applied very successfully to the problem of automated program repair. One of the standing issues is the small input window of current models which often cannot fully fit the context code required for a bug fix (e.g., method or class declarations of a project). Instead, input is often restricted to the local context, that is, the lines below and above the bug location. In this work we study the importance of this local context on repair success: how much local context is needed?; is context before or after the bug location more important? how is local context tied to the bug type? To answer these questions we train and evaluate Transformer models in many different local context configurations on three datasets and two programming languages. Our results indicate that overall repair success increases with the size of the local context (albeit not for all bug types) and confirm the common practice that roughly 50-60% of the input window should be used for context leading the bug. Our results are not only relevant for researchers working on Transformer-based APR tools but also for benchmark and dataset creators who must decide what and how much context to include in their datasets.
翻译:深度学习源代码模型已成功应用于自动化程序修复问题。现存问题之一是当前模型的输入窗口较小,通常无法完全容纳修复缺陷所需的上下文代码(例如项目中的方法或类声明)。相反,输入通常局限于局部上下文,即缺陷位置前后的代码行。本研究探讨局部上下文对修复成功的重要性:需要多少局部上下文?缺陷位置前后的上下文哪个更重要?局部上下文如何与缺陷类型相关联?为回答这些问题,我们在三个数据集和两种编程语言上,针对多种不同的局部上下文配置训练并评估了Transformer模型。结果表明,整体修复成功率随局部上下文规模增加而提升(尽管并非对所有缺陷类型),并证实了常见做法:约50-60%的输入窗口应用于缺陷前的上下文。我们的研究结果不仅对基于Transformer的APR工具研究人员具有参考价值,也对需决定数据集中应包含何种及多少上下文的基准和数据集创建者具有重要意义。