Automated program repair techniques aim to aid software developers with the challenging task of fixing bugs. In heuristic-based program repair, a search space of program variants, created via mutations on software, is explored to find potential patches for bugs. Most commonly, every selection of a mutation operator during search is performed uniformly at random, whcih can generate many buggy, even uncompilable program variants. Our goal is to reduce the generation of variants that do not compile or break intended functionality which waste considerable resources. In this paper, we investigate the feasibility of a reinforcement learning-based approach for the selection of mutation operators in heuristic-based program repair. Our proposed approach is programming language, granularity-level, and search strategy agnostic and allows for easy augmentation into existing heuristic-based repair tools. We conduct an extensive empirical evaluation of four operator selection techniques, two reward types, two credit assignment strategies, two integration methods, and three sets of mutation operators using 30,080 independent repair attempts. We evaluate our approach on 353 real-world bugs from the Defects4J benchmark.The reinforcement learning-based mutation operator selection results in a higher number of test-passing variants, but does not exhibit a noticeable improvement in the number of bugs patched in comparison with the baseline, which uses random selection. While reinforcement learning has been previously shown to be successful in improving the search of evolutionary algorithms, often used in heuristic-based program repair, it has not shown such improvements when applied to this area of research.
翻译:自动程序修复技术旨在协助软件开发人员完成修复漏洞这一具有挑战性的任务。在基于启发式的程序修复中,通过程序变异创建的程序变体搜索空间被用来寻找潜在的补丁。最常见的是,搜索过程中每次变异算子的选择都是完全随机进行的,这可能会生成许多有缺陷甚至无法编译的程序变体。我们的目标是减少生成本身无法编译或破坏预期功能、从而浪费大量资源的变体。在本文中,我们研究了在基于启发式的程序修复中,采用基于强化学习的方法来选择变异算子的可行性。我们提出的方法独立于编程语言、粒度级别和搜索策略,并且可以轻松地集成到现有的基于启发式的修复工具中。我们通过30,080次独立的修复尝试,对四种算子选择技术、两种奖励类型、两种信用分配策略、两种集成方法以及三组变异算子进行了广泛的实证评估。我们使用Defects4J基准测试中的353个真实世界漏洞评估了我们的方法。基于强化学习的变异算子选择虽然产生更多通过测试的程序变体,但与使用随机选择的基线相比,修补漏洞的数量并未出现显著提升。尽管先前的研究表明,强化学习在改进常用于基于启发式程序修复的进化算法搜索方面是成功的,但在应用于这一研究领域时,并未展现出类似的改进效果。