Significant recent progress has been made on deriving combination rules that can take as input a set of arbitrarily dependent p-values, and produce as output a single valid p-value. Here, we show that under the assumption of exchangeability of the p-values, many of those rules can be improved (made more powerful). While this observation by itself has practical implications (for example, under repeated tests involving data splitting), it also has implications for combining arbitrarily dependent p-values, since the latter can be made exchangeable by applying a uniformly random permutation. In particular, we derive several simple randomized combination rules for arbitrarily dependent p-values that are more powerful than their deterministic counterparts. For example, we derive randomized and exchangeable improvements of well known p-value combination rules like "twice the median" and "twice the average", as well as geometric and harmonic means. The main technical advance is to show that all these combination rules can be obtained by calibrating the p-values to e-values (using an $\alpha$-dependent calibrator), averaging those e-values, converting to a level $\alpha$ test using Markov's inequality, and finally obtaining p-values by combining this family of tests. The improvements are delivered via recent randomized and exchangeable variants of Markov's inequality.
翻译:近年来,关于如何从一组任意相依的p值中推导出有效组合规则并输出单一有效p值的研究取得了显著进展。本文证明:在p值满足可交换性假设的前提下,许多现有组合规则可以得到改进(即增强其统计功效)。这一观察不仅具有直接实践意义(例如在涉及数据拆分的重复检验中),还对任意相依p值的组合问题产生重要启示——因为通过施加均匀随机置换,任意相依p值均可转化为可交换形式。具体而言,我们提出了若干针对任意相依p值的简单随机化组合规则,这些规则比其确定性对应版本具有更强的统计功效。例如,我们推导了"中位数两倍法"、"平均值两倍法"等经典p值组合规则(包括几何平均与调和平均法)的可交换随机化改进版本。主要技术突破在于证明:所有组合规则均可通过以下步骤统一实现——将p值校准为e值(使用依赖于显著性水平α的校准函数)、对e值取平均、应用马尔可夫不等式构造水平α的检验、最后通过整合该检验族得到最终p值。这些改进得益于近期提出的马尔可夫不等式随机化与可交换变体。