Experimentation is an essential method for causal inference in any empirical discipline. Crossover-design experiments are common in Software Engineering (SE) research. In these, subjects apply more than one treatment in different orders. This design increases the amount of obtained data and deals with subject variability but introduces threats to internal validity like the learning and carryover effect. Vegas et al. reviewed the state of practice for crossover designs in SE research and provided guidelines on how to address its threats during data analysis while still harnessing its benefits. In this paper, we reflect on the impact of these guidelines and review the state of analysis of crossover design experiments in SE publications between 2015 and March 2024. To this end, by conducting a forward snowballing of the guidelines, we survey 136 publications reporting 67 crossover-design experiments and evaluate their data analysis against the provided guidelines. The results show that the validity of data analyses has improved compared to the original state of analysis. Still, despite the explicit guidelines, only 29.5% of all threats to validity were addressed properly. While the maturation and the optimal sequence threats are properly addressed in 35.8% and 38.8% of all studies in our sample respectively, the carryover threat is only modeled in about 3% of the observed cases. The lack of adherence to the analysis guidelines threatens the validity of the conclusions drawn from crossover design experiments
翻译:实验是任何经验学科中进行因果推断的基本方法。交叉设计实验在软件工程研究中十分常见。在此类设计中,受试者以不同顺序接受多种处理。这种设计增加了获取的数据量并处理了受试者间的变异性,但也引入了学习效应和残留效应等对内效度的威胁。Vegas等人回顾了软件工程研究中交叉设计的实践现状,并提供了如何在数据分析中应对这些威胁同时利用其优势的指导原则。本文反思了这些指导原则的影响,并综述了2015年至2024年3月期间软件工程出版物中交叉设计实验的分析现状。为此,通过对该指导原则进行前向滚雪球式文献追踪,我们调查了136篇出版物(报告了67项交叉设计实验),并依据所提指导原则评估了它们的数据分析。结果表明,数据分析的有效性相较于原始分析状态有所改善。然而,尽管存在明确的指导原则,仅有29.5%的效度威胁得到妥善处理。虽然成熟度威胁和最优序列威胁在我们的样本中分别有35.8%和38.8%的研究得到正确处理,但残留效应威胁仅在约3%的观察案例中得到建模。对分析指导原则的遵循不足,威胁着从交叉设计实验所得结论的有效性。