The analysis of large-scale datasets, especially in biomedical contexts, frequently involves a principled screening of multiple hypotheses. The celebrated two-group model jointly models the distribution of the test statistics with mixtures of two competing densities, the null and the alternative distributions. We investigate the use of weighted densities and, in particular, non-local densities as working alternative distributions, to enforce separation from the null and thus refine the screening procedure. We show how these weighted alternatives improve various operating characteristics, such as the Bayesian False Discovery rate, of the resulting tests for a fixed mixture proportion with respect to a local, unweighted likelihood approach. Parametric and nonparametric model specifications are proposed, along with efficient samplers for posterior inference. By means of a simulation study, we exhibit how our model compares with both well-established and state-of-the-art alternatives in terms of various operating characteristics. Finally, to illustrate the versatility of our method, we conduct three differential expression analyses with publicly-available datasets from genomic studies of heterogeneous nature.
翻译:大规模数据集的分析,尤其在生物医学背景下,通常涉及对多重假设进行原则性筛选。经典的两组模型通过零假设分布与备择分布这两种竞争密度的混合来联合建模检验统计量的分布。我们研究使用加权密度,特别是非局部密度作为工作备择分布,以强制其与零假设分离,从而优化筛选过程。我们展示了这些加权备择如何改进检验的多种运行特征,例如贝叶斯错误发现率,在固定混合比例下与局部的未加权似然方法相比。我们提出了参数化与非参数化的模型设定,以及用于后验推断的高效采样器。通过模拟研究,我们展示了我们的模型在与成熟方法及最新方法的比较中,在多种运行特征方面的表现。最后,为了说明我们方法的普适性,我们使用来自基因组研究的公开数据集进行了三项差异表达分析,这些数据集具有异质性特征。