We study batched bandit experiments and consider the problem of inference conditional on the realized stopping time, assignment probabilities, and target parameter, where all of these may be chosen adaptively using information up to the last batch of the experiment. Absent further restrictions on the experiment, we show that inference using only the results of the last batch is optimal. When the adaptive aspects of the experiment are known to be location-invariant, in the sense that they are unchanged when we shift all batch-arm means by a constant, we show that there is additional information in the data, captured by one additional linear function of the batch-arm means. In the more restrictive case where the stopping time, assignment probabilities, and target parameter are known to depend on the data only through a collection of polyhedral events, we derive computationally tractable and optimal conditional inference procedures.
翻译:我们研究了批量式赌博机实验,并考虑了在条件于已实现停止时间、分配概率和目标参数下的推断问题,其中这些因素均可能利用截至实验上一批次的信息自适应地选择。在不对实验施加额外限制的条件下,我们证明仅使用上一批次的结果进行推断是最优的。当实验的自适应特性已知具有位置不变性(即当我们将所有批次-臂均值平移一个常数时,这些特性保持不变)时,我们表明数据中存在额外信息,这些信息由批次-臂均值的另一个线性函数捕获。在更受限的情况下,即停止时间、分配概率和目标参数已知仅通过一组多面体事件依赖于数据时,我们推导出了计算可行且最优的条件推断方法。