Context: Mining software repositories is a popular means to gain insights into a software project's evolution, monitor project health, support decisions and derive best practices. Tools supporting the mining process are commonly applied by researchers and practitioners, but their limitations and agreement are often not well understood. Objective: This study investigates some threats to validity in complex tool pipelines for evolutionary software analyses and evaluates the tools' agreement in terms of data, study outcomes and conclusions for the same research questions. Method: We conduct a lightweight literature review to select three studies on collaboration and coordination, software maintenance and software quality from high-ranked venues, which we formally replicate with four independent, systematically selected mining tools to quantitatively and qualitatively compare the extracted data, analysis results and conclusions. Results: We find that numerous technical details in tool design and implementation accumulate along the complex mining pipelines and can cause substantial differences in the extracted baseline data, its derivatives, subsequent results of statistical analyses and, under specific circumstances, conclusions. Conclusions: Users must carefully choose tools and evaluate their limitations to assess the scope of validity in an adequate way. Reusing tools is recommended. Researchers and tool authors can promote reusability and help reducing uncertainties by reproduction packages and comparative studies following our approach.
翻译:背景:挖掘软件仓库是获取软件项目演化洞见、监控项目健康状况、支持决策并推导最佳实践的常用手段。支持挖掘过程的工具被研究人员和实践者广泛使用,但其局限性与一致性往往未被充分理解。目标:本研究探讨进化软件分析中复杂工具流水线所面临的有效性威胁,并评估工具在相同研究问题下数据、研究结果及结论的一致性。方法:我们通过轻量级文献综述选取了三篇来自高水平会议、涉及协作与协调、软件维护及软件质量的研究,使用四个独立且系统选取的挖掘工具对这些研究进行形式化复现,从定量和定性角度比较提取的数据、分析结果及结论。结果:我们发现工具设计与实现中的诸多技术细节在复杂挖掘流水线中不断累积,可能导致提取的基线数据、其衍生数据、后续统计分析结果乃至特定情况下的结论产生显著差异。结论:使用者必须谨慎选择工具并评估其局限性,以恰当方式界定有效性范围。建议复用现有工具。研究人员和工具开发者可通过提供复现包及开展遵循本方法的比较研究,促进工具可复用性并帮助降低不确定性。