The Fragility of Optimized Bandit Algorithms - 专知论文

会员服务 ·

0

赌博机/老虎机 · 优化器 · ARM · 次最优 · 设计 ·

2023 年 5 月 30 日

The Fragility of Optimized Bandit Algorithms

翻译：优化赌博机算法的脆弱性

Lin Fan,Peter W. Glynn

Much of the literature on optimal design of bandit algorithms is based on minimization of expected regret. It is well known that designs that are optimal over certain exponential families can achieve expected regret that grows logarithmically in the number of arm plays, at a rate governed by the Lai-Robbins lower bound. In this paper, we show that when one uses such optimized designs, the regret distribution of the associated algorithms necessarily has a very heavy tail, specifically, that of a truncated Cauchy distribution. Furthermore, for $p>1$, the $p$'th moment of the regret distribution grows much faster than poly-logarithmically, in particular as a power of the total number of arm plays. We show that optimized UCB bandit designs are also fragile in an additional sense, namely when the problem is even slightly mis-specified, the regret can grow much faster than the conventional theory suggests. Our arguments are based on standard change-of-measure ideas, and indicate that the most likely way that regret becomes larger than expected is when the optimal arm returns below-average rewards in the first few arm plays, thereby causing the algorithm to believe that the arm is sub-optimal. To alleviate the fragility issues exposed, we show that UCB algorithms can be modified so as to ensure a desired degree of robustness to mis-specification. In doing so, we also provide a sharp trade-off between the amount of UCB exploration and the tail exponent of the resulting regret distribution.

翻译：大量关于赌博机算法最优设计的文献基于期望遗憾最小化。众所周知，在特定指数族上最优的设计能够实现对数增长的期望遗憾（随臂拉动次数增长），其速率由赖-罗宾斯下界决定。本文证明，当使用此类优化设计时，相应算法的遗憾分布必然具有极重的尾部，具体表现为截断柯西分布。此外，对于$p>1$，遗憾分布的$p$阶矩增长速度远快于多对数增长，尤其达到总臂拉动次数的幂次增长。我们进一步展示，优化UCB赌博机设计在另一层面上同样脆弱：即使问题存在轻微误设，遗憾增长速度也可能远快于传统理论预测。我们的论证基于标准测度变换思想，表明遗憾超过预期的最可能情形是：最优臂在最初几次拉动中回报低于平均值，导致算法误判该臂为次优。为缓解所揭示的脆弱性问题，我们证明可通过修改UCB算法确保对误设具有期望的鲁棒性。在此过程中，我们给出了UCB探索量与遗憾分布尾部指数之间的精确权衡关系。

0

相关内容

赌博机/老虎机

赌博机/老虎机

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

食品风险残留物快速检测方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

可压缩湍流粒子输运的拉格朗日（Lagrangian）研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

提高基于Bulk-Micromegas结构的快中子探测器探测效率和位置分辨的方法的研究

国家自然科学基金

0+阅读 · 2012年12月31日

PPARγ-1SUMO化修饰在高（血）糖诱导血管内皮胰岛素抵抗中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于ForCES的软件定义网络（SDN）研究

国家自然科学基金

1+阅读 · 2012年12月31日

两类Monge-Ampere方程问题的研究

国家自然科学基金

1+阅读 · 2012年12月31日

钢轨缺陷的超声导波检测方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

调节韧带肌成纤维细胞转分化蛋白网络在损伤愈合中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

Extension of Switch Point Algorithm to Boundary-Value Problems

Arxiv

0+阅读 · 2023年7月19日

Visibility and Separability for a Declarative Linearizability Proof of the Timestamped Stack: Extended Version

Arxiv

0+阅读 · 2023年7月18日

Optimal storage codes on graphs with fixed locality

Arxiv

0+阅读 · 2023年7月17日

A subgradient method with constant step-size for $\ell_1$-composite optimization

Arxiv

0+阅读 · 2023年7月17日

Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale

Arxiv

0+阅读 · 2023年7月16日

Evaluation of Deep Reinforcement Learning Algorithms for Portfolio Optimisation

Arxiv

0+阅读 · 2023年7月15日

A PDE-Based Analysis of the Symmetric Two-Armed Bernoulli Bandit

Arxiv

0+阅读 · 2023年7月15日

On the Robustness of Epoch-Greedy in Multi-Agent Contextual Bandit Mechanisms

Arxiv

0+阅读 · 2023年7月15日

A `periodic table' of modes and maximum a posteriori estimators

Arxiv

0+阅读 · 2023年7月14日

An Incremental Span-Program-Based Algorithm and the Fine Print of Quantum Topological Data Analysis

Arxiv

0+阅读 · 2023年7月13日

VIP会员

文章信息

相关主题

赌博机/老虎机

最新内容

对抗环境下超视距目标打击的情报支援

对抗环境下超视距目标打击的情报支援

专知会员服务

3+阅读 · 今天14:49

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

专知会员服务

1+阅读 · 今天14:25

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

专知会员服务

2+阅读 · 今天13:57

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

专知会员服务

2+阅读 · 今天13:27

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

11+阅读 · 7月21日

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

10+阅读 · 7月21日

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

4+阅读 · 7月21日

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

6+阅读 · 7月21日

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

8+阅读 · 7月21日

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

6+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

8+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

6+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

9+阅读 · 7月20日

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

8+阅读 · 7月20日

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

10+阅读 · 7月20日

相关VIP内容

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

对抗环境下超视距目标打击的情报支援

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Extension of Switch Point Algorithm to Boundary-Value Problems

Arxiv

0+阅读 · 2023年7月19日

Visibility and Separability for a Declarative Linearizability Proof of the Timestamped Stack: Extended Version

Arxiv

0+阅读 · 2023年7月18日

Optimal storage codes on graphs with fixed locality

Arxiv

0+阅读 · 2023年7月17日

A subgradient method with constant step-size for $\ell_1$-composite optimization

Arxiv

0+阅读 · 2023年7月17日

Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale

Arxiv

0+阅读 · 2023年7月16日

Evaluation of Deep Reinforcement Learning Algorithms for Portfolio Optimisation

Arxiv

0+阅读 · 2023年7月15日

A PDE-Based Analysis of the Symmetric Two-Armed Bernoulli Bandit

Arxiv

0+阅读 · 2023年7月15日

On the Robustness of Epoch-Greedy in Multi-Agent Contextual Bandit Mechanisms

Arxiv

0+阅读 · 2023年7月15日

A `periodic table' of modes and maximum a posteriori estimators

Arxiv

0+阅读 · 2023年7月14日

An Incremental Span-Program-Based Algorithm and the Fine Print of Quantum Topological Data Analysis

Arxiv

0+阅读 · 2023年7月13日

相关基金

食品风险残留物快速检测方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

可压缩湍流粒子输运的拉格朗日（Lagrangian）研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

提高基于Bulk-Micromegas结构的快中子探测器探测效率和位置分辨的方法的研究

国家自然科学基金

0+阅读 · 2012年12月31日

PPARγ-1SUMO化修饰在高（血）糖诱导血管内皮胰岛素抵抗中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于ForCES的软件定义网络（SDN）研究

国家自然科学基金

1+阅读 · 2012年12月31日

两类Monge-Ampere方程问题的研究

国家自然科学基金

1+阅读 · 2012年12月31日

钢轨缺陷的超声导波检测方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

调节韧带肌成纤维细胞转分化蛋白网络在损伤愈合中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员