Statistical significance of both the original and the replication study is a commonly used criterion to assess replication attempts, also known as the two-trials rule in drug development. However, replication studies are sometimes conducted although the original study is non-significant, in which case Type-I error rate control across both studies is no longer guaranteed. We propose an alternative method to assess replicability using the sum of p-values from the two studies. The approach provides a combined p-value and can be calibrated to control the overall Type-I error rate at the same level as the two-trials rule but allows for replication success even if the original study is non-significant. The unweighted version requires a less restrictive level of significance at replication if the original study is already convincing which facilitates sample size reductions of up to 10%. Downweighting the original study accounts for possible bias and requires a more stringent significance level and larger samples sizes at replication. Data from four large-scale replication projects are used to illustrate and compare the proposed method with the two-trials rule, meta-analysis and Fisher's combination method.
翻译:原始研究与重复研究均达到统计学显著性,是评估重复性尝试的常用标准,在药物开发领域亦称为"双试验规则"。然而,当原始研究未达显著水平时,仍可能开展重复研究,此时无法保证两项研究的I类错误率联合控制。我们提出一种替代方法,通过合并两项研究的p值之和来评估重复性。该方法可提供组合p值,并能通过校准将总体I类错误率控制在与双试验规则相同的水平,同时允许原始研究未达显著时仍可判定重复成功。若原始研究已具说服力,非加权版本可降低重复研究所需的显著性水平,从而节省高达10%的样本量。对原始研究进行降权处理可抵消潜在偏倚,但需要更严格的显著性水平及更大的重复样本量。基于四个大规模重复项目的实证数据,展示了所提方法与双试验规则、元分析及Fisher组合法的比较结果。