Continuous-in-time Limit for Bayesian Bandits - 专知论文

会员服务 ·

0

赌博机/老虎机 · 优化器 · 再缩放 · Continuity · 计算成本 ·

2023 年 5 月 10 日

Continuous-in-time Limit for Bayesian Bandits

翻译：贝叶斯赌博机的时间连续极限

Yuhua Zhu,Zachary Izzo,Lexing Ying

This paper revisits the bandit problem in the Bayesian setting. The Bayesian approach formulates the bandit problem as an optimization problem, and the goal is to find the optimal policy which minimizes the Bayesian regret. One of the main challenges facing the Bayesian approach is that computation of the optimal policy is often intractable, especially when the length of the problem horizon or the number of arms is large. In this paper, we first show that under a suitable rescaling, the Bayesian bandit problem converges toward a continuous Hamilton-Jacobi-Bellman (HJB) equation. The optimal policy for the limiting HJB equation can be explicitly obtained for several common bandit problems, and we give numerical methods to solve the HJB equation when an explicit solution is not available. Based on these results, we propose an approximate Bayes-optimal policy for solving Bayesian bandit problems with large horizons. Our method has the added benefit that its computational cost does not increase as the horizon increases.

翻译：本文重新审视了贝叶斯框架下的赌博机问题。贝叶斯方法将赌博机问题表述为优化问题，其目标是找到使贝叶斯遗憾最小化的最优策略。贝叶斯方法面临的主要挑战之一是最优策略的计算通常难以处理，尤其是在问题时间跨度或臂数较大时。本文首先证明，在适当的重新标度下，贝叶斯赌博机问题收敛于一个连续的Hamilton-Jacobi-Bellman (HJB)方程。对于若干常见的赌博机问题，极限HJB方程的最优策略可以显式获得，并在无法获得显式解时给出求解HJB方程的数值方法。基于这些结果，我们提出了一种用于求解长时域贝叶斯赌博机问题的近似贝叶斯最优策略。该方法的一个额外优势是其计算成本不随时间跨度增加而增加。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【2023新书】随机模型基础，815页pdf

【2023新书】随机模型基础，815页pdf

专知会员服务

105+阅读 · 2023年5月10日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

自旋失措型多铁性氧化物ABO2中的磁电耦合效应研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

金融与管理中的HJB方程组的高效有限元方法

国家自然科学基金

0+阅读 · 2013年12月31日

钙钛矿铁电体-半导体硅异质结的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

丹参联合化疗和VEGF靶向药物对结肠癌的协同作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

泛函不等式与随机微分方程上的大偏差问题

国家自然科学基金

0+阅读 · 2012年12月31日

仿射T-S模糊系统的滑模控制及通用模糊滑模控制器研究

国家自然科学基金

1+阅读 · 2012年12月31日

联合携PGE2基因骨髓间充质干细胞移植重塑Kuppfer细胞表型诱导大鼠肝移植术后免疫耐受

国家自然科学基金

0+阅读 · 2011年12月31日

广义随机线性Markov切换系统非合作微分博弈理论及其在金融保险的应用

国家自然科学基金

0+阅读 · 2011年12月31日

哈密顿系统的定性理论与渐近性理论

国家自然科学基金

0+阅读 · 2011年12月31日

Improved error estimate for the order of strong convergence of the Euler method for random ordinary differential equations

Arxiv

0+阅读 · 2023年6月27日

Optimally Repurposing Existing Algorithms to Obtain Exponential-Time Approximations

Arxiv

0+阅读 · 2023年6月27日

A mobility-SAV approach for a Cahn-Hilliard equation\\ with degenerate mobilities

Arxiv

0+阅读 · 2023年6月27日

Non-asymptotic convergence bounds for Sinkhorn iterates and their gradients: a coupling approach

Arxiv

0+阅读 · 2023年6月26日

Stability and statistical inference for semidiscrete optimal transport maps

Arxiv

0+阅读 · 2023年6月26日

Analysis of dynamic restricted mean survival time based on pseudo-observations

Arxiv

0+阅读 · 2023年6月24日

Optimal harvesting policy for biological resources with uncertain heterogeneity for application in fisheries management

Arxiv

0+阅读 · 2023年6月24日

A nonparametrically corrected likelihood for Bayesian spectral analysis of multivariate time series

Arxiv

0+阅读 · 2023年6月23日

Bayesian inversion with α-stable priors

Arxiv

0+阅读 · 2023年6月23日

Iteratively Preconditioned Gradient-Descent Approach for Moving Horizon Estimation Problems

Arxiv

0+阅读 · 2023年6月22日

VIP会员

文章信息

相关主题

赌博机/老虎机

最新内容

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

1+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

2+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

8+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

5+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

4+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

6+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

6+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

8+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

7+阅读 · 6月17日

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

4+阅读 · 6月17日

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

6+阅读 · 6月17日

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

6+阅读 · 6月17日

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

5+阅读 · 6月17日

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

专知会员服务

4+阅读 · 6月17日

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

专知会员服务

6+阅读 · 6月16日

相关VIP内容

【2023新书】随机模型基础，815页pdf

【2023新书】随机模型基础，815页pdf

专知会员服务

105+阅读 · 2023年5月10日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Improved error estimate for the order of strong convergence of the Euler method for random ordinary differential equations

Arxiv

0+阅读 · 2023年6月27日

Optimally Repurposing Existing Algorithms to Obtain Exponential-Time Approximations

Arxiv

0+阅读 · 2023年6月27日

A mobility-SAV approach for a Cahn-Hilliard equation\\ with degenerate mobilities

Arxiv

0+阅读 · 2023年6月27日

Non-asymptotic convergence bounds for Sinkhorn iterates and their gradients: a coupling approach

Arxiv

0+阅读 · 2023年6月26日

Stability and statistical inference for semidiscrete optimal transport maps

Arxiv

0+阅读 · 2023年6月26日

Analysis of dynamic restricted mean survival time based on pseudo-observations

Arxiv

0+阅读 · 2023年6月24日

Optimal harvesting policy for biological resources with uncertain heterogeneity for application in fisheries management

Arxiv

0+阅读 · 2023年6月24日

A nonparametrically corrected likelihood for Bayesian spectral analysis of multivariate time series

Arxiv

0+阅读 · 2023年6月23日

Bayesian inversion with α-stable priors

Arxiv

0+阅读 · 2023年6月23日

Iteratively Preconditioned Gradient-Descent Approach for Moving Horizon Estimation Problems

Arxiv

0+阅读 · 2023年6月22日

相关基金

自旋失措型多铁性氧化物ABO2中的磁电耦合效应研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

金融与管理中的HJB方程组的高效有限元方法

国家自然科学基金

0+阅读 · 2013年12月31日

钙钛矿铁电体-半导体硅异质结的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

丹参联合化疗和VEGF靶向药物对结肠癌的协同作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

泛函不等式与随机微分方程上的大偏差问题

国家自然科学基金

0+阅读 · 2012年12月31日

仿射T-S模糊系统的滑模控制及通用模糊滑模控制器研究

国家自然科学基金

1+阅读 · 2012年12月31日

联合携PGE2基因骨髓间充质干细胞移植重塑Kuppfer细胞表型诱导大鼠肝移植术后免疫耐受

国家自然科学基金

0+阅读 · 2011年12月31日

广义随机线性Markov切换系统非合作微分博弈理论及其在金融保险的应用

国家自然科学基金

0+阅读 · 2011年12月31日

哈密顿系统的定性理论与渐近性理论

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员