Large Language Models (LLMs) are increasingly relevant in Software Engineering research and practice, with Automated Bug Fixing (ABF) being one of their key applications. ABF involves transforming a buggy method into its fixed equivalent. A common preprocessing step in ABF involves removing comments from code prior to training. However, we hypothesize that comments may play a critical role in fixing certain types of bugs by providing valuable design and implementation insights. In this study, we investigate how the presence or absence of comments, both during training and at inference time, impacts the bug-fixing capabilities of LLMs. We conduct an empirical evaluation comparing two model families, each evaluated under all combinations of training and inference conditions (with and without comments), and thereby revisiting the common practice of removing comments during training. To address the limited availability of comments in state-of-the-art datasets, we use an LLM to automatically generate comments for methods lacking them. Our findings show that comments improve ABF accuracy by up to threefold when present in both phases, while training with comments does not degrade performance when instances lack them. Additionally, an interpretability analysis identifies that comments detailing method implementation are particularly effective in aiding LLMs to fix bugs accurately.
翻译:大型语言模型(LLM)在软件工程研究与实践中日益重要,自动缺陷修复(ABF)是其关键应用之一。ABF涉及将存在缺陷的方法转换为修复后的等效版本。ABF中常见的预处理步骤是在训练前移除代码中的注释。然而,我们假设注释可能通过提供有价值的设计与实现信息,对修复特定类型的缺陷起到关键作用。本研究探究了注释在训练阶段与推理阶段的存在与否如何影响LLM的缺陷修复能力。我们进行了实证评估,比较了两个模型系列,每个模型均在训练与推理条件(有注释/无注释)的所有组合下进行评估,从而重新审视了训练期间移除注释的常见做法。针对现有前沿数据集中注释稀缺的问题,我们使用LLM为缺乏注释的方法自动生成注释。研究结果表明:当注释在两个阶段均存在时,ABF准确率最高可提升三倍;而在训练阶段使用注释并不会降低无注释实例的处理性能。此外,可解释性分析发现,详细描述方法实现的注释能特别有效地帮助LLM准确修复缺陷。