In several large-scale replication projects, statistically non-significant results in both the original and the replication study have been interpreted as a "replication success". Here we discuss the logical problems with this approach: Non-significance in both studies does not ensure that the studies provide evidence for the absence of an effect and "replication success" can virtually always be achieved if the sample sizes are small enough. In addition, the relevant error rates are not controlled. We show how methods, such as equivalence testing and Bayes factors, can be used to adequately quantify the evidence for the absence of an effect and how they can be applied in the replication setting. Using data from the Reproducibility Project: Cancer Biology we illustrate that many original and replication studies with "null results" are in fact inconclusive, and that their replicability is lower than suggested by the non-significance approach. We conclude that it is important to also replicate studies with statistically non-significant results, but that they should be designed, analyzed, and interpreted appropriately.
翻译:在若干大规模复制项目中,原始研究与复制研究均未达到统计学显著性的结果被解读为"复制成功"。本文论述了该方法的逻辑问题:两项研究均不显著并不能确保研究提供了效应缺失的证据,且若样本量足够小,"复制成功"几乎总能实现。此外,相关错误率也未得到控制。我们展示了如何运用等效性检验和贝叶斯因子等方法,充分量化效应缺失的证据,以及如何在复制情境中应用这些方法。通过"可重复性项目:癌症生物学"的数据,我们表明许多被认定为"零结果"的原始研究与复制研究实际上尚无定论,其可重复性低于非显著性方法所暗示的水平。我们得出结论:复制统计上非显著的结果是重要的,但需在实验设计、分析与解读环节给予恰当处理。