Discounted Thompson Sampling for Non-Stationary Bandit Problems - 专知论文

会员服务 ·

0

赌博机/老虎机 · 样本 · 衰减系数 · 相互独立的 · state-of-the-art ·

2023 年 5 月 22 日

Discounted Thompson Sampling for Non-Stationary Bandit Problems

翻译：折扣汤普森采样应对非平稳老虎机问题

Han Qi,Yue Wang,Li Zhu

Non-stationary multi-armed bandit (NS-MAB) problems have recently received significant attention. NS-MAB are typically modelled in two scenarios: abruptly changing, where reward distributions remain constant for a certain period and change at unknown time steps, and smoothly changing, where reward distributions evolve smoothly based on unknown dynamics. In this paper, we propose Discounted Thompson Sampling (DS-TS) with Gaussian priors to address both non-stationary settings. Our algorithm passively adapts to changes by incorporating a discounted factor into Thompson Sampling. DS-TS method has been experimentally validated, but analysis of the regret upper bound is currently lacking. Under mild assumptions, we show that DS-TS with Gaussian priors can achieve nearly optimal regret bound on the order of $\tilde{O}(\sqrt{TB_T})$ for abruptly changing and $\tilde{O}(T^{\beta})$ for smoothly changing, where $T$ is the number of time steps, $B_T$ is the number of breakpoints, $\beta$ is associated with the smoothly changing environment and $\tilde{O}$ hides the parameters independent of $T$ as well as logarithmic terms. Furthermore, empirical comparisons between DS-TS and other non-stationary bandit algorithms demonstrate its competitive performance. Specifically, when prior knowledge of the maximum expected reward is available, DS-TS has the potential to outperform state-of-the-art algorithms.

翻译：非平稳多臂老虎机（NS-MAB）问题近年来受到广泛关注。NS-MAB通常通过两种场景建模：突变型场景，其中奖励分布在特定时段内保持恒定，并在未知时间步发生突变；平滑变化型场景，其中奖励分布基于未知动态平滑演化。本文提出基于高斯先验的折扣汤普森采样（DS-TS）算法，以同时应对这两种非平稳设置。该算法通过将折扣因子引入汤普森采样实现被动适应变化。DS-TS方法已得到实验验证，但目前缺乏遗憾上界分析。在温和假设下，我们证明基于高斯先验的DS-TS可在突变型场景下达到近优的遗憾界$\tilde{O}(\sqrt{TB_T})$，并在平滑变化型场景下达到$\tilde{O}(T^{\beta})$，其中$T$为时间步数，$B_T$为断点数量，$\beta$与平滑变化环境相关，$\tilde{O}$隐藏了与$T$无关的参数及对数项。此外，DS-TS与其他非平稳老虎机算法的实证对比显示其具备竞争性性能。特别地，当已知最大期望奖励的先验信息时，DS-TS具有超越当前最优算法的潜力。

0

相关内容

赌博机/老虎机

赌博机/老虎机

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

肠道靶向性骨髓间充质干细胞通过重建肠道微生态来治疗实验性IBD的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

循经取穴治疗原发性高血压的宿主代谢-肠道微生物Cross-talk机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

不同生境条件下齿瓣石斛不同生长时期菌根真菌多样性研究

国家自然科学基金

0+阅读 · 2013年12月31日

量子自旋格子系统的拓扑序、量子动力学和量子quench

国家自然科学基金

0+阅读 · 2012年12月31日

MDSCs在动脉粥样硬化中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

次生布风层对分选流化床稳定性的协同作用

国家自然科学基金

0+阅读 · 2011年12月31日

结直肠癌细胞外基质的动态变化特征及其对上皮间质转化的作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

扰动可积非哈密顿系统的极限环分支

国家自然科学基金

0+阅读 · 2011年12月31日

生态恢复对红壤严重侵蚀地土壤水库重建的影响与机制

国家自然科学基金

0+阅读 · 2011年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

Range Avoidance for Constant-Depth Circuits: Hardness and Algorithms

Arxiv

0+阅读 · 2023年7月7日

PAC bounds of continuous Linear Parameter-Varying systems related to neural ODEs

Arxiv

0+阅读 · 2023年7月7日

BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits

Arxiv

0+阅读 · 2023年7月7日

The computational asymptotics of Gaussian variational inference and the Laplace approximation

Arxiv

0+阅读 · 2023年7月5日

D-optimal Subsampling Design for Massive Data Linear Regression

Arxiv

0+阅读 · 2023年7月5日

Exact and Parameterized Algorithms for the Independent Cutset Problem

Arxiv

0+阅读 · 2023年7月5日

A $p$-step-ahead sequential adaptive algorithm for D-optimal nonlinear regression design

Arxiv

0+阅读 · 2023年7月5日

Conditional and Residual Methods in Scalable Coding for Humans and Machines

Arxiv

0+阅读 · 2023年7月4日

A Non-Classical Parameterization for Density Estimation Using Sample Moments

Arxiv

0+阅读 · 2023年7月4日

Optimal Surrogate Boundary Selection and Scalability Studies for the Shifted Boundary Method on Octree Meshes

Arxiv

0+阅读 · 2023年7月4日

VIP会员

文章信息

相关主题

赌博机/老虎机

相互独立的

state-of-the-art

最新内容

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

0+阅读 · 53分钟前

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

0+阅读 · 刚刚

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

4+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

5+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

11+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

9+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

6+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

9+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

7+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

13+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

7+阅读 · 6月17日

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

6+阅读 · 6月17日

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

8+阅读 · 6月17日

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

8+阅读 · 6月17日

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

9+阅读 · 6月17日

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Range Avoidance for Constant-Depth Circuits: Hardness and Algorithms

Arxiv

0+阅读 · 2023年7月7日

PAC bounds of continuous Linear Parameter-Varying systems related to neural ODEs

Arxiv

0+阅读 · 2023年7月7日

BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits

Arxiv

0+阅读 · 2023年7月7日

The computational asymptotics of Gaussian variational inference and the Laplace approximation

Arxiv

0+阅读 · 2023年7月5日

D-optimal Subsampling Design for Massive Data Linear Regression

Arxiv

0+阅读 · 2023年7月5日

Exact and Parameterized Algorithms for the Independent Cutset Problem

Arxiv

0+阅读 · 2023年7月5日

A $p$-step-ahead sequential adaptive algorithm for D-optimal nonlinear regression design

Arxiv

0+阅读 · 2023年7月5日

Conditional and Residual Methods in Scalable Coding for Humans and Machines

Arxiv

0+阅读 · 2023年7月4日

A Non-Classical Parameterization for Density Estimation Using Sample Moments

Arxiv

0+阅读 · 2023年7月4日

Optimal Surrogate Boundary Selection and Scalability Studies for the Shifted Boundary Method on Octree Meshes

Arxiv

0+阅读 · 2023年7月4日

相关基金

肠道靶向性骨髓间充质干细胞通过重建肠道微生态来治疗实验性IBD的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

循经取穴治疗原发性高血压的宿主代谢-肠道微生物Cross-talk机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

不同生境条件下齿瓣石斛不同生长时期菌根真菌多样性研究

国家自然科学基金

0+阅读 · 2013年12月31日

量子自旋格子系统的拓扑序、量子动力学和量子quench

国家自然科学基金

0+阅读 · 2012年12月31日

MDSCs在动脉粥样硬化中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

次生布风层对分选流化床稳定性的协同作用

国家自然科学基金

0+阅读 · 2011年12月31日

结直肠癌细胞外基质的动态变化特征及其对上皮间质转化的作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

扰动可积非哈密顿系统的极限环分支

国家自然科学基金

0+阅读 · 2011年12月31日

生态恢复对红壤严重侵蚀地土壤水库重建的影响与机制

国家自然科学基金

0+阅读 · 2011年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员