Unification of Rare/Weak Detection Models using Moderate Deviations Analysis and Log-Chisquared P-values

Rare and Weak models for multiple hypothesis testing assume that only a small proportion of the tested hypotheses concern non-null effects and the individual effects are only moderately large, so they generally do not stand out individually, for example in a Bonferroni analysis. Such models have been studied in quite a few settings, for example in some cases studies focused on an underlying Gaussian means model for the hypotheses being tested; in some others, Poisson and Binomial. Such seemingly different models have asymptotically the following common structure. Summarizing the evidence of individual tests by the negative logarithm of its P-value, the model is asymptotically equivalent to a situation in which most negative log P-values have a standard exponential distribution but a small fraction of the P-values might have an alternative distribution which is approximately noncentral chisquared on one degree of freedom. This log-chisquared approximation is different from the log-normal approximation of Bahadur which is unsuitable for analyzing Rare and Weak multiple testing models. We characterize the asymptotic performance of global tests combining asymptotic log-chisquared P-values in terms of the chisquared mixture parameters: the scaling parameter controlling heteroscedasticity, the non-centrality parameter describing the effect size whenever it exists, and the parameter controlling the rarity of the non-null effects. In a phase space involving the last two parameters, we derive a region where all tests are asymptotically powerless. Outside of this region, the Berk-Jones and the Higher Criticism tests have maximal power. Inference techniques based on the minimal P-value, false-discovery rate controlling, and Fisher's combination test have sub-optimal asymptotic phase diagrams.

翻译：稀有/弱效应模型（Rare and Weak models）用于多重假设检验，假设被检验的假设中仅有小部分涉及非零效应，且单个效应仅中等大小，因此通常无法在Bonferroni分析等场景中单独凸显。此类模型已在多种情境下得到研究，例如部分研究聚焦于假设检验所基于的高斯均值模型，而另一些则关注泊松分布和二项分布模型。这些看似不同的模型在渐近意义上具有以下共同结构。通过将单个检验的证据总结为其p值的负对数，该模型渐近等价于：大多数负对数p值服从标准指数分布，但少量p值可能服从近似于自由度为1的非中心卡方分布的备择分布。这种对数卡方近似不同于Bahadur提出的对数正态近似，后者并不适用于稀有/弱多重检验模型的分析。我们基于卡方混合参数刻画了组合渐近对数卡方p值的全局检验的渐近性能：控制异方差性的缩放参数、描述效应量（若存在）的非中心参数，以及控制非零效应稀有度的参数。在包含后两个参数的相空间中，我们推导出所有检验均渐近无效的区域。在该区域之外，Berk-Jones检验和Higher Criticism检验具有最大功效。基于最小p值、错误发现率控制和Fisher组合检验的推断技术则具有次优的渐近相图。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日