Learning from the collective wisdom of crowds enhances the transparency of scientific findings by incorporating diverse perspectives into the decision-making process. Synthesizing such collective wisdom is related to the statistical notion of fusion learning from multiple data sources or studies. However, fusing inferences from diverse sources is challenging since cross-source heterogeneity and potential data-sharing complicate statistical inference. Moreover, studies may rely on disparate designs, employ widely different modeling techniques for inferences, and prevailing data privacy norms may forbid sharing even summary statistics across the studies for an overall analysis. In this paper, we propose an Integrative Ranking and Thresholding (IRT) framework for fusion learning in multiple testing. IRT operates under the setting where from each study a triplet is available: the vector of binary accept-reject decisions on the tested hypotheses, the study-specific False Discovery Rate (FDR) level and the hypotheses tested by the study. Under this setting, IRT constructs an aggregated, nonparametric, and discriminatory measure of evidence against each null hypotheses, which facilitates ranking the hypotheses in the order of their likelihood of being rejected. We show that IRT guarantees an overall FDR control under arbitrary dependence between the evidence measures as long as the studies control their respective FDR at the desired levels. Furthermore, IRT synthesizes inferences from diverse studies irrespective of the underlying multiple testing algorithms employed by them. While the proofs of our theoretical statements are elementary, IRT is extremely flexible, and a comprehensive numerical study demonstrates that it is a powerful framework for pooling inferences.
翻译:从群体集体智慧中学习,通过将多元视角纳入决策过程,提升了科学发现的透明度。整合此类集体智慧与统计学中来自多数据源或多研究的融合学习概念相关。然而,由于跨源异质性和潜在的数据共享,融合来自不同来源的推断颇具挑战,这进一步使统计推断复杂化。此外,不同研究可能采用不同设计、运用截然不同的建模技术进行推断,且现行的数据隐私规范可能禁止跨研究共享甚至汇总统计数据以进行整体分析。本文提出一种面向多重假设检验中融合学习的集成排序与阈值(IRT)框架。IRT在每项研究提供三类信息的设定下运行:关于检验假设的二元接受-拒绝决策向量、研究特异的错误发现率(FDR)水平以及该研究检验的假设集合。在此设定下,IRT构建一种聚合的、非参数化的且具区分度的证据度量,用于评估各原假设的证据强度,从而便于按假设被拒绝的可能性进行排序。我们证明,只要各研究在原设定水平上控制自身FDR,无论各证据度量间存在何种任意相关性,IRT均可保证全局FDR控制。此外,无论各研究采用何种底层多重检验算法,IRT均能综合不同研究的推断结果。尽管理论证明的推导较为基础,但IRT具有极高的灵活性,综合数值实验表明,它是一种强大的推断汇合框架。