$P$-values that are derived from continuously distributed test statistics are typically uniformly distributed on $(0,1)$ under least favorable parameter configurations (LFCs) in the null hypothesis. Conservativeness of a $p$-value $P$ (meaning that $P$ is under the null hypothesis stochastically larger than a random variable which is uniformly distributed on $(0,1)$) can occur if the test statistic from which $P$ is derived is discrete, or if the true parameter value under the null is not an LFC. To deal with both of these sources of conservativeness, we present two approaches utilizing randomized $p$-values, namely single-stage and two-stage randomization. We illustrate their effectiveness for testing a composite null hypothesis under a binomial model. We also give an example of how the proposed $p$-values can be used to test a composite null in group testing designs. Similar to previous findings, we find that the proposed randomized $p$-values are less conservative compared to non-randomized $p$-values under the null hypothesis, but that they are stochastically not smaller under the alternative. The problem of establishing the validity of randomized $p$-values is not trivial and has received attention in previous literature. We show that our proposed randomized $p$-values are valid under various discrete statistical models which are such that the distribution of the corresponding test statistic belongs to an exponential family. The behaviour of the power function for the tests based on the proposed randomized $p$-values as a function of the sample size is also investigated. Simulations and a real data analysis are used to compare the different considered $p$-values.
翻译:源于连续分布检验统计量的$p$-值在零假设下最不利参数配置(LFCs)下通常服从$(0,1)$上的均匀分布。若导出$p$-值$P$的检验统计量为离散型,或零假设下真实参数值非LFC时,可能导致$p$-值$P$的保守性(即$P$在零假设下随机大于$(0,1)$上的均匀分布变量)。针对这两类保守性来源,本文提出两种基于随机化$p$-值的方法:单阶段随机化与两阶段随机化。我们在二项模型下验证了这些方法检验复合零假设的有效性,并通过实例展示如何将所提$p$-值应用于群检验设计中的复合零假设检验。与已有研究一致,我们发现:零假设下所提随机化$p$-值比非随机化$p$-值更不保守,但在备择假设下其随机性并不更小。建立随机化$p$-值有效性的问题并非易事,且在既往文献中备受关注。我们证明,对于检验统计量分布属于指数族的各类离散统计模型,所提随机化$p$-值均具有效性。此外,本文还研究了基于所提随机化$p$-值的检验功效函数随样本量的变化规律。通过模拟实验与真实数据分析,对各类$p$-值进行了比较。