Consider a simultaneous hypothesis testing problem where each hypothesis is associated with a test statistic. Suppose it is difficult to obtain the null distribution of the test statistics, but some null hypotheses--referred to as the internal negative controls--are known to be true. When it is reasonable to assume that the test statistics associated with the negative controls are exchangeable with those associated with the unknown true null hypotheses, we propose to use a statistic's Rank Among Negative Controls (RANC) as a p-value for the corresponding hypothesis. We provide two theoretical prospectives on this proposal. First, we view the empirical distribution of the negative control statistics as an estimate of the null distribution. We use this to show that, when the test statistics are exchangeable, the RANC p-values are individually valid and have a positive regression dependence on the subset of true nulls. Second, we study the empirical processes of the test statistics indexed by the rejection threshold. We use this to show that the Benjamini-Hochberg procedure applied to the RANC p-values may still control the false discovery rate when the test statistics are not exchangeable. The practical performance of our method is illustrated using numerical simulations and a real proteomic dataset.
翻译:考虑一个同步假设检验问题,其中每个假设对应一个检验统计量。假设难以获得检验统计量的原分布,但已知某些原假设——称为内部阴性对照——为真。当可以合理假设与阴性对照相关的检验统计量与未知的真原假设相关的检验统计量具有可交换性时,我们提出使用统计量在阴性对照中的秩次作为相应假设的p值。我们为这一提议提供了两个理论视角。首先,我们将阴性对照统计量的经验分布视为原分布的估计,并以此表明,当检验统计量可交换时,基于秩次的p值在个体上有效,并且在真原假设子集上具有正回归依赖性。其次,我们研究以拒绝阈值为指标的检验统计量的经验过程,以此表明即使检验统计量不可交换时,将Benjamini-Hochberg过程应用于基于秩次的p值仍可能控制错误发现率。通过数值模拟和真实蛋白质组数据集展示了我们方法的实际性能。