Continuous Integration (CI) is a software engineering practice that aims to reduce the cost and risk of code integration among teams. Recent empirical studies have confirmed associations between CI and the software quality (SQ). However, no existing study investigates causal relationships between CI and SQ. This paper investigates it by applying the causal Direct Acyclic Graphs (DAGs) technique. We combine two other strategies to support this technique: a literature review and a Mining Software Repository (MSR) study. In the first stage, we review the literature to discover existing associations between CI and SQ, which help us create a "literature-based causal DAG" in the second stage. This DAG encapsulates the literature assumptions regarding CI and its influence on SQ. In the third stage, we analyze 12 activity months for 70 opensource projects by mining software repositories -- 35 CI and 35 no-CI projects. This MSR study is not a typical "correlation is not causation" study because it is used to verify the relationships uncovered in the causal DAG produced in the first stages. The fourth stage consists of testing the statistical implications from the "literature-based causal DAG" on our dataset. Finally, in the fifth stage, we build a DAG with observations from the literature and the dataset, the "literature-data DAG". In addition to the direct causal effect of CI on SQ, we find evidence of indirect effects of CI. For example, CI affects teams' communication, which positively impacts SQ. We also highlight the confounding effect of project age.
翻译:持续集成(CI)是一种旨在降低团队间代码集成成本与风险的软件工程实践。近期实证研究已证实CI与软件质量(SQ)之间存在关联,但尚无研究探究两者间的因果关系。本文通过应用因果有向无环图(DAG)技术对此展开研究,并辅以两种策略:文献综述与软件仓库挖掘(MSR)研究。第一阶段,我们通过文献综述发掘CI与SQ之间的现有关联,为第二阶段构建"基于文献的因果DAG"奠定基础。该DAG凝练了文献中关于CI及其对SQ影响的假设。第三阶段,我们挖掘软件仓库数据,对70个开源项目(35个采用CI,35个未采用CI)进行12个月活动分析。此MSR研究并非典型的"相关非因果"研究,而是用于验证前序阶段因果DAG中揭示的关系。第四阶段,我们在数据集上检验"基于文献的因果DAG"的统计推断结果。最终在第五阶段,结合文献与数据集观测构建"文献-数据综合DAG"。除发现CI对SQ的直接因果效应外,我们还发现CI的间接效应证据。例如,CI影响团队沟通效率,进而正向作用于SQ。同时,项目年龄的混杂效应亦得到凸显。