Background: The standard regulatory approach to assess replication success is the two-trials rule, requiring both the original and the replication study to be significant with effect estimates in the same direction. The sceptical p-value was recently presented as an alternative method for the statistical assessment of the replicability of study results. Methods: We compare the statistical properties of the sceptical p-value and the two-trials rule. We illustrate the performance of the different methods using real-world evidence emulations of randomized, controlled trials (RCTs) conducted within the RCT DUPLICATE initiative. Results: The sceptical p-value depends not only on the two p-values, but also on sample size and effect size of the two studies. It can be calibrated to have the same Type-I error rate as the two-trials rule, but has larger power to detect an existing effect. In the application to the results from the RCT DUPLICATE initiative, the sceptical p-value leads to qualitatively similar results than the two-trials rule, but tends to show more evidence for treatment effects compared to the two-trials rule. Conclusion: The sceptical p-value represents a valid statistical measure to assess the replicability of study results and is especially useful in the context of real-world evidence emulations.
翻译:背景:评估可重复性成功的标准监管方法是双试验规则,要求原始研究和重复研究均具有显著性且效应估计方向一致。怀疑p值最近被提出作为研究结果可重复性统计评估的替代方法。方法:我们比较了怀疑p值与双试验规则的统计特性。我们通过RCT DUPLICATE计划内开展的随机对照试验(RCT)的真实世界证据模拟,阐释了不同方法的性能。结果:怀疑p值不仅取决于两个p值,还取决于两项研究的样本量和效应大小。可通过校准使其与双试验规则具有相同的I类错误率,但检测现有效应的功效更强。在应用于RCT DUPLICATE计划的结果时,怀疑p值得出与双试验规则定性相似的结果,但相较于双试验规则,往往显示出更多的治疗效应证据。结论:怀疑p值是评估研究结果可重复性的有效统计度量,在真实世界证据模拟的背景下尤其有用。