Software bugs pose an ever-present concern for developers, and patching such bugs requires a considerable amount of costs through complex operations. In contrast, introducing bugs can be an effortless job, in that even a simple mutation can easily break the Program Under Test (PUT). Existing research has considered these two opposed activities largely separately, either trying to automatically generate realistic patches to help developers, or to find realistic bugs to simulate and prevent future defects. Despite the fundamental differences between them, however, we hypothesise that they do not syntactically differ from each other when considered simply as code changes. To examine this assumption systematically, we investigate the relationship between patches and buggy commits, both generated manually and automatically, using a clustering and pattern analysis. A large scale empirical evaluation reveals that up to 70% of patches and faults can be clustered together based on the similarity between their lexical patterns; further, 44% of the code changes can be abstracted into the identical change patterns. Moreover, we investigate whether code mutation tools can be used as Automated Program Repair (APR) tools, and APR tools as code mutation tools. In both cases, the inverted use of mutation and APR tools can perform surprisingly well, or even better, when compared to their original, intended uses. For example, 89% of patches found by SequenceR, a deep learning based APR tool, can also be found by its inversion, i.e., a model trained with faults and not patches. Similarly, real fault coupling study of mutants reveals that TBar, a template based APR tool, can generate 14% and 3% more fault couplings than traditional mutation tools, PIT and Major respectively, when used as a mutation tool.
翻译:软件缺陷一直是开发者面临的主要问题,修复这些缺陷需要通过复杂操作投入大量成本。相比之下,引入缺陷却轻而易举——即便是简单的代码变异操作也极易导致被测程序出错。现有研究通常将这两类对立活动分开处理:要么试图自动生成逼真的补丁来帮助开发者,要么寻找真实缺陷以模拟和预防未来故障。然而,尽管两者存在本质差异,我们假设当仅从代码变更角度考虑时,它们在语法层面并无区别。为系统验证这一假设,我们通过聚类与模式分析方法,研究人工生成和自动生成的补丁与缺陷提交之间的关系。大规模实证评估表明:基于词法模式的相似性,高达70%的补丁与缺陷可被聚类至同一组群;更进一步,44%的代码变更可抽象为完全相同的变更模式。此外,我们探究了代码变异工具能否充当自动程序修复工具,以及自动修复工具能否作为变异工具使用。在这两种场景中,将变异工具与修复工具反向使用,其表现出乎意料地优秀,甚至在某些情况下优于原本的设计用途。例如,基于深度学习的自动修复工具SequenceR发现的补丁中,有89%也能通过其反转模型(即用缺陷而非补丁训练的模型)找到。同样,真实缺陷的耦合性研究表明:当基于模板的自动修复工具TBar被用作变异工具时,其产生的缺陷耦合数比传统变异工具PIT和Major分别高出14%和3%。