Weak Signal Asymptotics for Sequentially Randomized Experiments

from arxiv, Forthcoming in Management Science. An earlier draft of this paper was circulated under the title "Diffusion Asymptotics for Sequential Experiments.'' Xu Kuang published under a different full name in earlier versions of this manuscript. Please use X. Kuang and S. Wager when citing this paper

We use the lens of weak signal asymptotics to study a class of sequentially randomized experiments, including those that arise in solving multi-armed bandit problems. In an experiment with $n$ time steps, we let the mean reward gaps between actions scale to the order $1/\sqrt{n}$ so as to preserve the difficulty of the learning task as $n$ grows. In this regime, we show that the sample paths of a class of sequentially randomized experiments -- adapted to this scaling regime and with arm selection probabilities that vary continuously with state -- converge weakly to a diffusion limit, given as the solution to a stochastic differential equation. The diffusion limit enables us to derive refined, instance-specific characterization of stochastic dynamics, and to obtain several insights on the regret and belief evolution of a number of sequential experiments including Thompson sampling (but not UCB, which does not satisfy our continuity assumption). We show that all sequential experiments whose randomization probabilities have a Lipschitz-continuous dependence on the observed data suffer from sub-optimal regret performance when the reward gaps are relatively large. Conversely, we find that a version of Thompson sampling with an asymptotically uninformative prior variance achieves near-optimal instance-specific regret scaling, including with large reward gaps, but these good regret properties come at the cost of highly unstable posterior beliefs.

翻译：本文利用弱信号渐近的视角研究了一类序列随机化实验，包括那些为解决多臂赌博机问题而产生的实验。在包含$n$个时间步的实验中，我们令各动作的平均奖励差距按$1/\sqrt{n}$量级缩放，从而在$n$增长时保持学习任务的难度。在该机制下，我们证明了一类序列随机化实验——其适应此缩放机制且臂选择概率随状态连续变化——的样本路径弱收敛于一个扩散极限，该极限由随机微分方程的解给出。扩散极限使我们能够推导出随机动力学的精细化、实例特异性刻画，并获得关于多种序列实验（包括汤普森采样，但不包括不满足连续性假设的UCB算法）的遗憾与信念演化的若干洞见。研究表明：当随机化概率对观测数据的依赖满足Lipschitz连续性时，所有序列实验在奖励差距相对较大时均存在次优的遗憾性能。相反，我们发现采用渐近无信息先验方差的汤普森采样变体，即使在奖励差距较大时也能实现近乎最优的实例特异性遗憾缩放，但这一良好的遗憾特性是以高度不稳定的后验信念为代价的。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日