We consider the problem of sequential multiple hypothesis testing with nontrivial data collection cost. This problem appears, for example, when conducting biological experiments to identify differentially expressed genes in a disease process. This work builds on the generalized $\alpha$-investing framework that enables control of the false discovery rate in a sequential testing setting. We make a theoretical analysis of the long term asymptotic behavior of $\alpha$-wealth which motivates a consideration of sample size in the $\alpha$-investing decision rule. Posing the testing process as a game with nature, we construct a decision rule that optimizes the expected return (ERO) of $\alpha$-wealth and provides an optimal sample size for the test. Empirical results show that a cost-aware ERO decision rule correctly rejects more false null hypotheses than other methods. We extend cost-aware ERO investing to finite-horizon testing which enables the decision rule to allocate samples across many tests. Finally, empirical tests on real data sets from biological experiments show that cost-aware ERO produces actionable decisions to conduct tests at optimal sample sizes.
翻译:我们考虑在数据收集成本非平凡时的序贯多重假设检验问题。例如,在生物实验中识别疾病进程中差异表达基因时会出现此类问题。本研究基于广义$\alpha$-投资框架,该框架可在序贯检验设置中控制错误发现率。我们对$\alpha$-财富的长期渐近行为进行理论分析,由此促使在$\alpha$-投资决策规则中考虑样本量。将检验过程视为与自然的博弈,我们构建了能够优化$\alpha$-财富期望回报率(ERO)并提供最优检验样本量的决策规则。实证结果表明,成本感知的ERO决策规则在正确拒绝错误原假设方面优于其他方法。我们将成本感知的ERO投资法扩展至有限时域检验,使决策规则能够跨多个检验分配样本。最后,基于真实生物实验数据集的实证检验表明,成本感知的ERO能生成以最优样本量执行检验的可操作决策。