We consider the problem of sequential multiple hypothesis testing with nontrivial data collection costs. This problem appears, for example, when conducting biological experiments to identify differentially expressed genes of a disease process. This work builds on the generalized $\alpha$-investing framework which enables control of the false discovery rate in a sequential testing setting. We make a theoretical analysis of the long term asymptotic behavior of $\alpha$-wealth which motivates a consideration of sample size in the $\alpha$-investing decision rule. Posing the testing process as a game with nature, we construct a decision rule that optimizes the expected $\alpha$-wealth reward (ERO) and provides an optimal sample size for each test. Empirical results show that a cost-aware ERO decision rule correctly rejects more false null hypotheses than other methods for $n=1$ where $n$ is the sample size. When the sample size is not fixed cost-aware ERO uses a prior on the null hypothesis to adaptively allocate of the sample budget to each test. We extend cost-aware ERO investing to finite-horizon testing which enables the decision rule to allocate samples in a non-myopic manner. Finally, empirical tests on real data sets from biological experiments show that cost-aware ERO balances the allocation of samples to an individual test against the allocation of samples across multiple tests.
翻译:我们研究了在数据采集成本不可忽略的情况下进行序贯多重假设检验的问题。例如,在开展生物实验以识别疾病过程中差异表达基因时,就会遇到此类问题。本研究基于广义α-投资框架展开,该框架能够在序贯检验场景中控制错误发现率。我们对α-财富的长期渐近行为进行了理论分析,从而在α-投资决策规则中引入样本量的考量。通过将检验过程建模为与自然的博弈,我们构建了一个决策规则,该规则能优化期望α-财富奖励(ERO),并为每次检验提供最优样本量。实证结果表明,当样本量n=1时,成本感知的ERO决策规则比其它方法能更正确地拒绝更多错误零假设。当样本量不固定时,成本感知的ERO方法利用零假设的先验信息自适应地将样本预算分配至每次检验。我们将成本感知的ERO投资方法扩展至有限时域检验,使决策规则能够以非短视的方式分配样本。最后,基于生物实验真实数据集的实证检验表明,成本感知的ERO方法能够平衡单个检验的样本分配与多个检验间的样本分配。