Last Switch Dependent Bandits with Monotone Payoff Functions - 专知论文

会员服务 ·

0

赌博机/老虎机 · Bandits · 近似 · 类别 · MoDELS ·

2023 年 6 月 1 日

Last Switch Dependent Bandits with Monotone Payoff Functions

翻译：最后开关依赖型单调收益函数的赌博机问题

Ayoub Foussoul,Vineet Goyal,Orestis Papadigenopoulos,Assaf Zeevi

from arxiv, Accepted to the 40th International Conference on Machine Learning (ICML 2023)

In a recent work, Laforgue et al. introduce the model of last switch dependent (LSD) bandits, in an attempt to capture nonstationary phenomena induced by the interaction between the player and the environment. Examples include satiation, where consecutive plays of the same action lead to decreased performance, or deprivation, where the payoff of an action increases after an interval of inactivity. In this work, we take a step towards understanding the approximability of planning LSD bandits, namely, the (NP-hard) problem of computing an optimal arm-pulling strategy under complete knowledge of the model. In particular, we design the first efficient constant approximation algorithm for the problem and show that, under a natural monotonicity assumption on the payoffs, its approximation guarantee (almost) matches the state-of-the-art for the special and well-studied class of recharging bandits (also known as delay-dependent). In this attempt, we develop new tools and insights for this class of problems, including a novel higher-dimensional relaxation and the technique of mirroring the evolution of virtual states. We believe that these novel elements could potentially be used for approaching richer classes of action-induced nonstationary bandits (e.g., special instances of restless bandits). In the case where the model parameters are initially unknown, we develop an online learning adaptation of our algorithm for which we provide sublinear regret guarantees against its full-information counterpart.

翻译：在近期工作中，Laforgue等人提出了最后开关依赖型（LSD）赌博机模型，旨在刻画由玩家与环境交互导致的非平稳现象。典型例子包括：饱和效应（连续执行同一动作导致性能下降）与剥夺效应（动作经过不活跃间隔后收益增加）。本文致力于理解LSD赌博机规划问题的可逼近性——即在完全知晓模型参数的前提下，计算最优拉臂策略这一NP困难问题。我们首次为该问题设计了具有常数近似比的高效算法，并证明在收益函数满足自然单调性的条件下，其近似保证（几乎）匹配了特殊且已被充分研究的再充电型赌博机（亦称延迟依赖型）领域的最新成果。在此过程中，我们为该类问题开发了新型工具与见解，包括一种新颖的高维松弛技术以及虚拟状态演化镜像方法。我们相信这些创新要素有望用于处理更广泛的动作诱发型非平稳赌博机问题（例如休息赌博机的特殊实例）。针对模型参数初始未知的场景，我们开发了算法的在线学习变体，并提供了相对于全信息版本次线性遗憾界的保证。

0

相关内容

赌博机/老虎机

赌博机/老虎机

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

图论中的整数流与圆流

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

随机偏微分方程多辛几何算法及不确定性量化

国家自然科学基金

0+阅读 · 2015年12月31日

流体中形状优化问题的高可扩展并行区域分解算法

国家自然科学基金

1+阅读 · 2013年12月31日

随机环境下卡尔曼滤波器动态特性

国家自然科学基金

1+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

自偏置DEG基础理论及关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin/GHSR通过AMPK信号通路调控动脉粥样硬化斑块稳定性的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

拟南芥花粉外壁发育相关基因的克隆和功能分析

国家自然科学基金

0+阅读 · 2009年12月31日

Model Reporting for Certifiable AI: A Proposal from Merging EU Regulation into AI Development

Arxiv

0+阅读 · 2023年7月21日

Learning and Generalizing Polynomials in Simulation Metamodeling

Arxiv

0+阅读 · 2023年7月20日

Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design

Arxiv

0+阅读 · 2023年7月20日

Model Selection for Generic Contextual Bandits

Arxiv

0+阅读 · 2023年7月20日

Universality of Spectral Independence with Applications to Fast Mixing in Spin Glasses

Arxiv

0+阅读 · 2023年7月19日

Properties of Discrete Sliced Wasserstein Losses

Arxiv

0+阅读 · 2023年7月19日

The Geometric Median and Applications to Robust Mean Estimation

Arxiv

0+阅读 · 2023年7月19日

Tightness without Counterexamples: A New Approach and New Results for Prophet Inequalities

Arxiv

0+阅读 · 2023年7月19日

First-Order Stable Model Semantics with Intensional Functions

Arxiv

0+阅读 · 2023年7月15日

Explainable Artificial Intelligence for Autonomous Driving: A Comprehensive Overview and Field Guide for Future Research Directions

Arxiv

18+阅读 · 2021年12月21日

VIP会员

文章信息

相关主题

赌博机/老虎机

最新内容

ICML 2026 教程 | 数值优化理论还重要吗？

ICML 2026 教程 | 数值优化理论还重要吗？

专知会员服务

2+阅读 · 今天11:09

ICM 2026 | 陶哲轩：人工智能时代的数学

ICM 2026 | 陶哲轩：人工智能时代的数学

专知会员服务

1+阅读 · 今天11:05

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

专知会员服务

4+阅读 · 今天2:54

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

专知会员服务

6+阅读 · 今天2:47

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

专知会员服务

5+阅读 · 今天2:40

《反无人机交战场景下的战斗归零研究》

《反无人机交战场景下的战斗归零研究》

专知会员服务

4+阅读 · 今天2:34

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

专知会员服务

3+阅读 · 今天2:12

博士论文 | 用代码结构感知方法推进代码大模型

博士论文 | 用代码结构感知方法推进代码大模型

专知会员服务

5+阅读 · 7月25日

综述 | 遥感多模态大模型：领域专用还是通用模型？

综述 | 遥感多模态大模型：领域专用还是通用模型？

专知会员服务

5+阅读 · 7月25日

《面向指挥控制训练与实时北约兼容数据分发的战术模拟器》

《面向指挥控制训练与实时北约兼容数据分发的战术模拟器》

专知会员服务

4+阅读 · 7月25日

《决策模型比较研究》

《决策模型比较研究》

专知会员服务

11+阅读 · 7月25日

全球军事与武器工业中的人工智能：应用、方法与影响（万字长文）

全球军事与武器工业中的人工智能：应用、方法与影响（万字长文）

专知会员服务

7+阅读 · 7月25日

《美军水下战与海床战概述及本地实施》

《美军水下战与海床战概述及本地实施》

专知会员服务

6+阅读 · 7月25日

面向未来冲突推进陆军情报体制改革

面向未来冲突推进陆军情报体制改革

专知会员服务

5+阅读 · 7月25日

人工智能赋能无人机：俄乌冲突案例及其深远影响（万字长文）

人工智能赋能无人机：俄乌冲突案例及其深远影响（万字长文）

专知会员服务

6+阅读 · 7月25日

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

ICM 2026 | 陶哲轩：人工智能时代的数学

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

ICML 2026 教程 | 数值优化理论还重要吗？

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Model Reporting for Certifiable AI: A Proposal from Merging EU Regulation into AI Development

Arxiv

0+阅读 · 2023年7月21日

Learning and Generalizing Polynomials in Simulation Metamodeling

Arxiv

0+阅读 · 2023年7月20日

Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design

Arxiv

0+阅读 · 2023年7月20日

Model Selection for Generic Contextual Bandits

Arxiv

0+阅读 · 2023年7月20日

Universality of Spectral Independence with Applications to Fast Mixing in Spin Glasses

Arxiv

0+阅读 · 2023年7月19日

Properties of Discrete Sliced Wasserstein Losses

Arxiv

0+阅读 · 2023年7月19日

The Geometric Median and Applications to Robust Mean Estimation

Arxiv

0+阅读 · 2023年7月19日

Tightness without Counterexamples: A New Approach and New Results for Prophet Inequalities

Arxiv

0+阅读 · 2023年7月19日

First-Order Stable Model Semantics with Intensional Functions

Arxiv

0+阅读 · 2023年7月15日

Explainable Artificial Intelligence for Autonomous Driving: A Comprehensive Overview and Field Guide for Future Research Directions

Arxiv

18+阅读 · 2021年12月21日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

图论中的整数流与圆流

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

随机偏微分方程多辛几何算法及不确定性量化

国家自然科学基金

0+阅读 · 2015年12月31日

流体中形状优化问题的高可扩展并行区域分解算法

国家自然科学基金

1+阅读 · 2013年12月31日

随机环境下卡尔曼滤波器动态特性

国家自然科学基金

1+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

自偏置DEG基础理论及关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin/GHSR通过AMPK信号通路调控动脉粥样硬化斑块稳定性的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

拟南芥花粉外壁发育相关基因的克隆和功能分析

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员