Permutation procedures are common practice in hypothesis testing when distributional assumptions about the test statistic are not met or unknown. With only few permutations, empirical p-values lie on a coarse grid and may even be zero when the observed test statistic exceeds all permuted values. Such zero p-values are statistically invalid and hinder multiple testing correction. Parametric tail modeling with the Generalized Pareto Distribution (GPD) has been proposed to address this issue, but existing implementations can again yield zero p-values when the estimated shape parameter is negative and the fitted distribution has a finite upper bound. We introduce a method for accurate and zero-free p-value approximation in permutation testing, embedded in the permApprox workflow and R package. Building on GPD tail modeling, the method enforces a support constraint during parameter estimation to ensure valid extrapolation beyond the observed statistic, thereby strictly avoiding zero p-values. The workflow further integrates robust parameter estimation, data-driven threshold selection, and principled handling of hybrid p-values that are discrete in the bulk and continuous in the extreme tail. Extensive simulations using two-sample t-tests and Wilcoxon rank-sum tests show that permApprox produces accurate, robust, and zero-free p-value approximations across a wide range of sample and effect sizes. Applications to single-cell RNA-seq and microbiome data demonstrate its practical utility: permApprox yields smooth and interpretable p-value distributions even with few permutations. By resolving the zero-p-value problem while preserving accuracy and computational efficiency, permApprox enables reliable permutation-based inference in high-dimensional and computationally intensive settings.
翻译:置换检验是当检验统计量的分布假设不满足或未知时假设检验中的常用方法。当置换次数较少时,经验p值位于粗糙的网格上,若观测到的检验统计量超过所有置换值,p值甚至可能为零。此类零p值在统计上无效,并会妨碍多重检验校正。已有研究提出使用广义帕累托分布(GPD)进行参数化尾部建模以解决此问题,但现有方法在估计的形状参数为负且拟合分布具有有限上界时,仍可能产生零p值。我们提出了一种在置换检验中实现精确且无零值p值近似的方法,该方法嵌入于permApprox工作流程及R包中。该方法基于GPD尾部建模,在参数估计过程中强制执行支撑集约束,以确保对超出观测统计量的区域进行有效外推,从而严格避免零p值。该工作流程进一步整合了稳健的参数估计、数据驱动的阈值选择,以及对在主体部分离散而在极端尾部连续的混合p值进行原则性处理。通过使用双样本t检验和Wilcoxon秩和检验进行大量模拟实验表明,permApprox能在广泛的样本量和效应量范围内产生精确、稳健且无零值的p值近似。在单细胞RNA-seq和微生物组数据中的应用证明了其实际效用:即使置换次数很少,permApprox也能产生平滑且可解释的p值分布。通过解决零p值问题,同时保持准确性和计算效率,permApprox为高维和计算密集型场景下的置换推断提供了可靠工具。