An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits

Experimentation is crucial for managers to rigorously quantify the value of a change and determine if it leads to a statistically significant improvement over the status quo, thus augmenting their decision-making. Many companies now mandate that all changes undergo experimentation, presenting two challenges: (1) reducing the risk/cost of experimentation by minimizing the proportion of customers assigned to the inferior treatment and (2) increasing the experimentation velocity by enabling managers to stop experiments as soon as results are statistically significant. This paper simultaneously addresses both challenges by proposing the Mixture Adaptive Design (MAD), a new experimental design for multi-armed bandit (MAB) algorithms that enables anytime valid inference on the Average Treatment Effect (ATE) for any MAB algorithm. Intuitively, the MAB "mixes" any bandit algorithm with a Bernoulli design such that at each time step, the probability that a customer is assigned via the Bernoulli design is controlled by a user-specified deterministic sequence that can converge to zero. The sequence enables managers to directly and interpretably control the trade-off between regret minimization and inferential precision. Under mild conditions on the rate the sequence converges to zero, we provide a confidence sequence that is asymptotically anytime valid and demonstrate that the MAD is guaranteed to have a finite stopping time in the presence of a true non-zero ATE. Hence, the MAD allows managers to stop experiments early when a significant ATE is detected while ensuring valid inference, enhancing both the efficiency and reliability of adaptive experiments. Empirically, we demonstrate that the MAD achieves finite-sample anytime-validity while accurately and precisely estimating the ATE, all without incurring significant losses in reward compared to standard bandit designs.

翻译：实验对于管理者严谨量化变更价值、判断其是否较现状产生统计显著改进至关重要，从而增强决策的科学性。当前众多企业要求所有变更均需经过实验验证，这带来两大挑战：(1)通过最小化分配至次优处理的客户比例以降低实验风险/成本；(2)通过允许管理者在结果达到统计显著性时立即终止实验以提升实验速度。本文通过提出混合自适应设计（MAD）同时应对这两项挑战，该设计为多臂赌博机（MAB）算法提供了一种新型实验方案，支持对任意MAB算法的平均处理效应（ATE）进行任意时间有效推断。直观而言，MAD将任意赌博机算法与伯努利设计进行“混合”，使得在每个时间步中，客户通过伯努利设计分配的概率受用户指定的确定性序列控制，该序列可收敛至零。该序列使管理者能够直接且可解释地控制遗憾最小化与推断精度之间的权衡。在序列收敛速率满足温和条件下，我们构建了渐近任意时间有效的置信序列，并证明MAD在存在真实非零ATE时保证具有有限停止时间。因此，MAD允许管理者在检测到显著ATE时提前终止实验，同时确保推断有效性，从而提升自适应实验的效率和可靠性。实证研究表明，MAD在实现有限样本任意时间有效性的同时，能准确且精确地估计ATE，且相较于标准赌博机设计未产生显著奖励损失。