Statistical significance of both the original and the replication study is a commonly used criterion to assess replication attempts, also known as the two-trials rule in drug development. However, replication studies are sometimes conducted although the original study is non-significant, in which case Type-I error rate control across both studies is no longer guaranteed. We propose an alternative method to assess replicability using the sum of p-values from the two studies. The approach provides a combined p-value and can be calibrated to control the overall Type-I error rate at the same level as the two-trials rule but allows for replication success even if the original study is non-significant. The unweighted version requires a less restrictive level of significance at replication if the original study is already convincing which facilitates sample size reductions of up to 10%. Downweighting the original study accounts for possible bias and requires a more stringent significance level and larger samples sizes at replication. Data from four large-scale replication projects are used to illustrate and compare the proposed method with the two-trials rule, meta-analysis and Fisher's combination method.
翻译:原始研究与复制研究均具有统计学显著性,是评估复制尝试的常用标准,在药物开发中亦称为双试验规则。然而,有时即使原始研究未达到显著性,仍会开展复制研究,此时无法保证两项研究的整体Ⅰ类错误率得到控制。本文提出一种利用两项研究p值之和评估可复制性的替代方法。该方法可提供合并p值,并能通过校准将整体Ⅰ类错误率控制在双试验规则同等水平,同时允许在原始研究未达显著性的情况下仍能实现复制成功。若原始研究已具说服力,未加权版本对复制研究的显著性水平要求更为宽松,可使样本量减少最高达10%。对原始研究进行降权处理可应对潜在偏倚,但需在复制阶段采用更严格的显著性水平和更大的样本量。本文利用四个大规模复制项目的数据,将所提方法与双试验规则、荟萃分析及费希尔组合方法进行了对比说明。