A Fine-grained Data Set and Analysis of Tangling in Bug Fixing Commits

Steffen Herbold,Alexander Trautsch,Benjamin Ledel,Alireza Aghamohammadi,Taher Ahmed Ghaleb,Kuljit Kaur Chahal,Tim Bossenmaier,Bhaveet Nagaria,Philip Makedonski,Matin Nili Ahmadabadi,Kristof Szabados,Helge Spieker,Matej Madeja,Nathaniel Hoy,Valentina Lenarduzzi,Shangwen Wang,Gema Rodríguez-Pérez,Ricardo Colomo-Palacios,Roberto Verdecchia,Paramvir Singh,Yihao Qin,Debasish Chakroborti,Willard Davis,Vijay Walunj,Hongjun Wu,Diego Marcilio,Omar Alam,Abdullah Aldaeej,Idan Amit,Burak Turhan,Simon Eismann,Anna-Katharina Wickert,Ivano Malavolta,Matus Sulir,Fatemeh Fard,Austin Z. Henley,Stratos Kourtzanidis,Eray Tuzun,Christoph Treude,Simin Maleki Shamasbi,Ivan Pashchenko,Marvin Wyrich,James Davis,Alexander Serebrenik,Ella Albrecht,Ethem Utku Aktas,Daniel Strüber,Johannes Erbel

from arxiv, Status: Accepted at Empirical Software Engineering

Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case. Conclusion: Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise.

翻译：上层环境 : 上层环境 : 对软件进行修改, 以同时解决多种关注。对于对错误感兴趣的研究人员来说, 缠在一起的操作意味着他们实际上不仅研究错误, 而且还研究与错误研究无关的其他问题。目标 : 我们想要更好地了解串点的流行程度和在错误修正中纠缠的改变类型。方法 : 我们使用人群源代码的手工标签来验证哪些变化有助于纠正错误修复承诺中的每行的错误。每行都有四个参与者的标签。如果至少有三个参与者同意同一标签, 我们就会达成共识。结果 : 我们估计, 纠正错误的所有变化中有17%到32% 修改源代码, 以修正根本问题。但是, 当我们只考虑对生产代码的修改, 将这一比率提高到66%到87% 。我们发现, 大约11% 的线条很难标签导致参与者之间的积极分歧。由于确认的串点和我们数据中的不确定性, 我们估计, 3 至 47% 数据中有3 47% 的数据在不手工解动的情况下会吵吵吵,, 取决于使用情况。。最终。将。。将。。。将。。。将将将将。