We consider a variant of sequential testing by betting where, at each time step, the statistician is presented with multiple data sources (arms) and obtains data by choosing one of the arms. We consider the composite global null hypothesis $\mathscr{P}$ that all arms are null in a certain sense (e.g. all dosages of a treatment are ineffective) and we are interested in rejecting $\mathscr{P}$ in favor of a composite alternative $\mathscr{Q}$ where at least one arm is non-null (e.g. there exists an effective treatment dosage). We posit an optimality desideratum that we describe informally as follows: even if several arms are non-null, we seek $e$-processes and sequential tests whose performance are as strong as the ones that have oracle knowledge about which arm generates the most evidence against $\mathscr{P}$. Formally, we generalize notions of log-optimality and expected rejection time optimality to more than one arm, obtaining matching lower and upper bounds for both. A key technical device in this optimality analysis is a modified upper-confidence-bound-like algorithm for unobservable but sufficiently "estimable" rewards. In the design of this algorithm, we derive nonasymptotic concentration inequalities for optimal wealth growth rates in the sense of Kelly [1956]. These may be of independent interest.
翻译:我们研究了一种基于博弈的序贯检验变体,其中在每个时间步,统计学家面对多个数据源(臂),并通过选择其中一个臂获取数据。我们考虑复合全局零假设$\mathscr{P}$:所有臂在某种意义下均为零(例如,某种治疗的所有剂量均无效),并关注其对立假设$\mathscr{Q}$:至少有一个臂为非零(例如,存在有效的治疗剂量)。我们提出一个最优性准则,其非正式描述如下:即使多个臂为非零,我们寻求的$e$过程与序贯检验的性能应尽可能与拥有“神谕知识”(即知晓哪个臂能产生最强证据反对$\mathscr{P}$)的检验相当。形式上,我们将对数最优性和期望拒绝时间最优性推广至多臂情形,并给出两者的匹配下界与上界。该最优性分析的关键技术工具是一种针对不可观测但足够“可估计”奖励的改进版上置信界算法。在该算法设计中,我们推导了关于最优财富增长率的非渐近集中不等式(基于Kelly [1956]的定义),这些结果可能具有独立的研究价值。