In collaborative software development, multiple contributors frequently change the source code in parallel to implement new features, fix bugs, refactor existing code, and make other changes. These simultaneous changes need to be merged into the same version of the source code. However, the merge operation can fail, and developer intervention is required to resolve the conflicts. Studies in the literature show that 10 to 20 percent of all merge attempts result in conflicts, which require the manual developer's intervention to complete the process. In this paper, we concern about a specific type of change that affects the structure of the source code and has the potential to increase the merge effort: code refactorings. We analyze the relationship between the occurrence of refactorings and the merge effort. To do so, we applied a data mining technique called association rule extraction to find patterns of behavior that allow us to analyze the influence of refactorings on the merge effort. Our experiments extracted association rules from 40,248 merge commits that occurred in 28 popular open-source projects. The results indicate that: (i) the occurrence of refactorings increases the chances of having merge effort; (ii) the more refactorings, the greater the chances of effort; (iii) the more refactorings, the greater the effort; and (iv) parallel refactorings increase even more the chances of having effort, as well as the intensity of it. The results obtained may suggest behavioral changes in the way refactorings are implemented by developer teams. In addition, they can indicate possible ways to improve tools that support code merging and those that recommend refactorings, considering the number of refactorings and merge effort attributes.
翻译:在协同软件开发中,多个贡献者经常并行修改源代码以实现新功能、修复漏洞、重构现有代码及进行其他更改。这些同步变更需要被合并到同一版本的源代码中。然而,合并操作可能失败,需要开发者介入解决冲突。文献研究表明,10%至20%的合并尝试会导致冲突,需要开发者手动干预才能完成流程。本文关注一类特定类型的变更——代码重构,这类变更会影响源代码结构并可能增加合并工作量。我们分析了重构发生与合并工作量之间的关系。为此,我们采用了一种名为关联规则提取的数据挖掘技术,以识别行为模式,从而分析重构对合并工作的影响。实验从28个流行开源项目的40,248次合并提交中提取了关联规则。结果表明:(i)重构的发生会增加合并工作量的可能性;(ii)重构次数越多,合并工作量的可能性越大;(iii)重构次数越多,合并工作量越大;(iv)并行重构会进一步增加合并工作量的发生几率及其强度。这些发现可能建议开发者团队调整重构实施方式,同时为改进代码合并支持工具和重构推荐工具提供潜在方向——需考虑重构数量与合并工作量属性。