We study batched bandit experiments and consider the problem of inference conditional on the realized stopping time, assignment probabilities, and target parameter, where all of these may be chosen adaptively using information up to the last batch of the experiment. Absent further restrictions on the experiment, we show that inference using only the results of the last batch is optimal. When the adaptive aspects of the experiment are known to be location-invariant, in the sense that they are unchanged when we shift all batch-arm means by a constant, we show that there is additional information in the data, captured by one additional linear function of the batch-arm means. In the more restrictive case where the stopping time, assignment probabilities, and target parameter are known to depend on the data only through a collection of polyhedral events, we derive computationally tractable and optimal conditional inference procedures.
翻译:本文研究批量臂实验,并考虑在给定实现停止时间、分配概率和目标参数条件下的推断问题,其中所有这些都可能使用截至实验最后一批的信息自适应选择。在未对实验施加额外限制的情况下,我们证明仅使用最后一批结果的推断是最优的。当实验的自适应特性已知具有位置不变性(即所有批次-臂均值平移常数时这些特性保持不变)时,我们证明数据中存在额外信息,这些信息可由批次-臂均值的一个附加线性函数捕获。在更严格的场景下,当停止时间、分配概率和目标参数已知仅通过多面体事件集合依赖于数据时,我们推导出计算可处理且最优的条件推断方法。