Automated program repair techniques aim to aid software developers with the challenging task of fixing bugs. In heuristic-based program repair, a search space of program variants is created by applying mutation operations on the source code to find potential patches for bugs. Most commonly, every selection of a mutation operator during search is performed uniformly at random. The inefficiency of this critical step in the search creates many variants that do not compile or break intended functionality, wasting considerable resources as a result. In this paper, we address this issue and propose a reinforcement learning-based approach to optimise the selection of mutation operators in heuristic-based program repair. Our solution is programming language, granularity-level, and search strategy agnostic and allows for easy augmentation into existing heuristic-based repair tools. We conduct extensive experimentation on four operator selection techniques, two reward types, two credit assignment strategies, two integration methods, and three sets of mutation operators using 22,300 independent repair attempts. We evaluate our approach on 353 real-world bugs from the Defects4J benchmark. Results show that the epsilon-greedy multi-armed bandit algorithm with average credit assignment is best for mutation operator selection. Our approach exhibits a 17.3% improvement upon the baseline, by generating patches for 9 additional bugs for a total of 61 patched bugs in the Defects4J benchmark.
翻译:自动程序修复技术旨在辅助软件开发人员完成修复缺陷这一具有挑战性的任务。在基于启发式的程序修复中,通过对源代码应用变异操作创建程序变体的搜索空间,以寻找缺陷的潜在补丁。最常见的方式是,在搜索过程中每次选择变异算子时都采用均匀随机的方式。这一关键步骤的低效性导致生成的许多变体无法编译或破坏预期功能,从而浪费了大量计算资源。本文针对该问题,提出了一种基于强化学习的方法来优化启发式程序修复中变异算子的选择。我们的解决方案与编程语言、粒度级别和搜索策略无关,并能轻松集成到现有基于启发式的修复工具中。我们通过22,300次独立修复尝试,对四种算子选择技术、两种奖励类型、两种信用分配策略、两种集成方法以及三组变异算子进行了广泛实验。在Defects4J基准数据集中的353个真实缺陷上评估了该方法。结果表明,采用平均信用分配的贪婪多臂赌博机算法最适合变异算子选择。与基线相比,我们的方法提升了17.3%,在Defects4J基准数据集中额外修复了9个缺陷,总共修复了61个缺陷。