Weak Signal Asymptotics for Sequentially Randomized Experiments

We use the lens of weak signal asymptotics to study a class of sequentially randomized experiments, including those that arise in solving multi-armed bandit problems. In an experiment with $n$ time steps, we let the mean reward gaps between actions scale to the order $1/\sqrt{n}$ so as to preserve the difficulty of the learning task as $n$ grows. In this regime, we show that the sample paths of a class of sequentially randomized experiments -- adapted to this scaling regime and with arm selection probabilities that vary continuously with state -- converge weakly to a diffusion limit, given as the solution to a stochastic differential equation. The diffusion limit enables us to derive refined, instance-specific characterization of stochastic dynamics, and to obtain several insights on the regret and belief evolution of a number of sequential experiments including Thompson sampling (but not UCB, which does not satisfy our continuity assumption). We show that all sequential experiments whose randomization probabilities have a Lipschitz-continuous dependence on the observed data suffer from sub-optimal regret performance when the reward gaps are relatively large. Conversely, we find that a version of Thompson sampling with an asymptotically uninformative prior variance achieves near-optimal instance-specific regret scaling, including with large reward gaps, but these good regret properties come at the cost of highly unstable posterior beliefs.

翻译：本文通过弱信号渐近的视角研究一类序贯随机实验，包括求解多臂老虎机问题时产生的实验方法。在具有$n$个时间步长的实验中，我们令动作间平均奖励差距缩放至$1/\sqrt{n}$量级，以保持随$n$增长时学习任务的难度不变。在此框架下，我们证明一类适配该缩放机制、且臂选择概率随状态连续变化的序贯随机实验的样本路径弱收敛于扩散极限，该极限由随机微分方程的解给出。扩散极限使我们能够对随机动力学进行精细的实例特异性刻画，并获取关于若干序贯实验（包括汤普森采样，但不包括不满足连续性假设的UCB算法）的遗憾演化与信念更新的若干洞见。研究表明：当随机化概率对观测数据具有Lipschitz连续依赖关系时，所有序贯随机实验在奖励差距相对较大时均会遭受次优的遗憾表现。反之，我们发现在渐近无信息先验方差条件下，汤普森采样的变体能够实现近乎最优的实例特异性遗憾缩放（包括在奖励差距较大时），但这一优良遗憾特性是以极度不稳定的后验信念为代价的。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日