The problem of combining p-values is an old and fundamental one, and the classic assumption of independence is often violated or unverifiable in many applications. There are many well-known rules that can combine a set of arbitrarily dependent p-values (for the same hypothesis) into a single p-value. We show that essentially all these existing rules can be strictly improved when the p-values are exchangeable, or when external randomization is allowed (or both). For example, we derive randomized and/or exchangeable improvements of well known rules like "twice the median" and "twice the average", as well as geometric and harmonic means. Exchangeable p-values are often produced one at a time (for example, under repeated tests involving data splitting), and our rules can combine them sequentially as they are produced, stopping when the combined p-values stabilize. Our work also improves rules for combining arbitrarily dependent p-values, since the latter becomes exchangeable if they are presented to the analyst in a random order. The main technical advance is to show that all existing combination rules can be obtained by calibrating the p-values to e-values (using an $\alpha$-dependent calibrator), averaging those e-values, converting to a level-$\alpha$ test using Markov's inequality, and finally obtaining p-values by combining this family of tests; the improvements are delivered via recent randomized and exchangeable variants of Markov's inequality.
翻译:p值组合是一个古老而基础的问题,在众多应用中,经典的独立性假设常被违反或无法验证。存在许多众所周知的规则,可将一组任意相关的p值(针对同一假设)组合为单个p值。我们证明,当p值可交换时,或允许外部随机化时(或两者兼具),本质上所有这些现有规则都能得到严格改进。例如,我们推导出"两倍中位数"和"两倍平均值"等经典规则以及几何平均与调和平均的随机化及/或可交换改进版本。可交换p值通常逐个产生(例如在涉及数据分割的重复检验中),我们的规则可在其产生过程中进行顺序组合,并在组合p值稳定时停止。本研究亦改进了组合任意相关p值的规则,因为若以随机顺序呈现给分析者,这些p值即具有可交换性。主要技术进展在于证明:所有现有组合规则均可通过将p值校准为e值(使用$\alpha$依赖型校准器)、对这些e值取平均、利用马尔可夫不等式转换为$\alpha$水平检验,最终通过组合该检验族获得p值;而改进方案则通过马尔可夫不等式的最新随机化与可交换变体实现。