Communication-Constrained Bandits under Additive Gaussian Noise - 专知论文

会员服务 ·

0

赌博机/老虎机 · UniFormer · 估计/估计量 · 噪声 · 学习器 ·

2023 年 4 月 25 日

Communication-Constrained Bandits under Additive Gaussian Noise

翻译：通信受限加性高斯噪声下的多臂赌博机

Prathamesh Mayekar,Jonathan Scarlett,Vincent Y. F. Tan

We study a distributed stochastic multi-armed bandit where a client supplies the learner with communication-constrained feedback based on the rewards for the corresponding arm pulls. In our setup, the client must encode the rewards such that the second moment of the encoded rewards is no more than $P$, and this encoded reward is further corrupted by additive Gaussian noise of variance $\sigma^2$; the learner only has access to this corrupted reward. For this setting, we derive an information-theoretic lower bound of $\Omega\left(\sqrt{\frac{KT}{\mathtt{SNR} \wedge1}} \right)$ on the minimax regret of any scheme, where $ \mathtt{SNR} := \frac{P}{\sigma^2}$, and $K$ and $T$ are the number of arms and time horizon, respectively. Furthermore, we propose a multi-phase bandit algorithm, $\mathtt{UE\text{-}UCB++}$, which matches this lower bound to a minor additive factor. $\mathtt{UE\text{-}UCB++}$ performs uniform exploration in its initial phases and then utilizes the {\em upper confidence bound }(UCB) bandit algorithm in its final phase. An interesting feature of $\mathtt{UE\text{-}UCB++}$ is that the coarser estimates of the mean rewards formed during a uniform exploration phase help to refine the encoding protocol in the next phase, leading to more accurate mean estimates of the rewards in the subsequent phase. This positive reinforcement cycle is critical to reducing the number of uniform exploration rounds and closely matching our lower bound.

翻译：我们研究了一种分布式随机多臂赌博机问题，其中客户端基于对应臂的奖励，向学习器提供通信受限的反馈。在我们的设定中，客户端必须对奖励进行编码，使得编码奖励的二阶矩不超过$P$，并且该编码奖励进一步受到方差为$\sigma^2$的加性高斯噪声的干扰；学习器仅能访问此受干扰的奖励。针对该设定，我们推导出任何方案的最小化遗憾的信息论下界为$\Omega\left(\sqrt{\frac{KT}{\mathtt{SNR} \wedge1}} \right)$，其中$\mathtt{SNR} := \frac{P}{\sigma^2}$，$K$和$T$分别为臂数和时间范围。此外，我们提出了一种多阶段赌博机算法$\mathtt{UE\text{-}UCB++}$，该算法与该下界仅相差一个小的加性因子。$\mathtt{UE\text{-}UCB++}$在初始阶段进行均匀探索，然后在最终阶段利用上置信界（UCB）赌博机算法。$\mathtt{UE\text{-}UCB++}$的一个有趣特征是：在均匀探索阶段形成的对均值奖励的粗略估计，有助于优化下一阶段的编码协议，从而使后续阶段对奖励均值的估计更精确。这一正反馈循环对于减少均匀探索轮次、并与我们的下界紧密匹配至关重要。

0

相关内容

赌博机/老虎机

赌博机/老虎机

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

73+阅读 · 2022年7月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

专知

22+阅读 · 2018年4月21日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

肺炎支原体外排泵ABC Transporter在大环内酯类耐药中的作用机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

单个中性原子的操控与精密测量

国家自然科学基金

0+阅读 · 2013年12月31日

S = 1/2的J1-J2阻挫自旋链材料的基态和量子相变研究

国家自然科学基金

0+阅读 · 2013年12月31日

两类Monge-Ampere方程问题的研究

国家自然科学基金

1+阅读 · 2012年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

1+阅读 · 2011年12月31日

转录因子AP-2α22312;UVB诱发皮肤癌中的作用和机制研究

国家自然科学基金

0+阅读 · 2010年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

双亲性共聚物自组装表面活性胶体粒子及其乳化性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

拟南芥VSP蛋白的晶体结构和催化特性研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于可加模糊行为的轮式机器人运动规划与控制

国家自然科学基金

0+阅读 · 2009年12月31日

Distributed Consensus Algorithm for Decision-Making in Multi-agent Multi-armed Bandit

Arxiv

0+阅读 · 2023年6月9日

Improved Bounds for Sampling Solutions of Random CNF Formulas

Arxiv

0+阅读 · 2023年6月9日

Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards

Arxiv

0+阅读 · 2023年6月8日

Recovering Simultaneously Structured Data via Non-Convex Iteratively Reweighted Least Squares

Arxiv

0+阅读 · 2023年6月8日

Robust Non-Linear Feedback Coding via Power-Constrained Deep Learning

Arxiv

0+阅读 · 2023年6月7日

Misspecification Analysis of High-Dimensional Random Effects Models for Estimation of Signal-to-Noise Ratios

Arxiv

0+阅读 · 2023年6月7日

Smooth Non-Stationary Bandits

Arxiv

0+阅读 · 2023年6月7日

Revisiting Weighted Strategy for Non-stationary Parametric Bandits

Arxiv

0+阅读 · 2023年6月7日

On the Fundamental Tradeoff of Integrated Sensing and Communications Under Gaussian Channels

Arxiv

0+阅读 · 2023年6月7日

Complexity of a Class of First-Order Objective-Function-Free Optimization Algorithms

Arxiv

0+阅读 · 2023年6月6日

VIP会员

文章信息

相关主题

赌博机/老虎机

估计/估计量

最新内容

《越野作战环境下路径规划的多准则整数规划模型》

《越野作战环境下路径规划的多准则整数规划模型》

专知会员服务

4+阅读 · 今天8:06

人工智能大语言模型引擎如何重塑全球冲突信息环境最新50页

人工智能大语言模型引擎如何重塑全球冲突信息环境最新50页

专知会员服务

3+阅读 · 今天8:00

《防空系统对自主武器系统辩论中“有意义的人类控制”的启示》70页报告

《防空系统对自主武器系统辩论中“有意义的人类控制”的启示》70页报告

专知会员服务

3+阅读 · 今天7:53

“对标ChatGPT”：乌军研发Marichka AI系统用于战场筹划

“对标ChatGPT”：乌军研发Marichka AI系统用于战场筹划

专知会员服务

6+阅读 · 今天7:49

《同步多无人机系统中的故障与通信》

《同步多无人机系统中的故障与通信》

专知会员服务

2+阅读 · 今天6:23

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

专知会员服务

2+阅读 · 7月28日

博士论文 | 从算法到基础模型：强化学习的统一视角

博士论文 | 从算法到基础模型：强化学习的统一视角

专知会员服务

7+阅读 · 7月28日

面向国防作战的最佳自主与蜂群无人机技术

面向国防作战的最佳自主与蜂群无人机技术

专知会员服务

7+阅读 · 7月28日

《异构人类团队的协作决策过程混合建模研究》

《异构人类团队的协作决策过程混合建模研究》

专知会员服务

8+阅读 · 7月28日

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

专知会员服务

8+阅读 · 7月28日

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

专知会员服务

9+阅读 · 7月28日

博士论文 | 面向大模型推理的内存高效算法

博士论文 | 面向大模型推理的内存高效算法

专知会员服务

5+阅读 · 7月27日

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

10+阅读 · 7月27日

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

14+阅读 · 7月27日

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

10+阅读 · 7月27日

相关VIP内容

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

73+阅读 · 2022年7月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能大语言模型引擎如何重塑全球冲突信息环境最新50页

“对标ChatGPT”：乌军研发Marichka AI系统用于战场筹划

《越野作战环境下路径规划的多准则整数规划模型》

《防空系统对自主武器系统辩论中“有意义的人类控制”的启示》70页报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

专知

22+阅读 · 2018年4月21日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Distributed Consensus Algorithm for Decision-Making in Multi-agent Multi-armed Bandit

Arxiv

0+阅读 · 2023年6月9日

Improved Bounds for Sampling Solutions of Random CNF Formulas

Arxiv

0+阅读 · 2023年6月9日

Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards

Arxiv

0+阅读 · 2023年6月8日

Recovering Simultaneously Structured Data via Non-Convex Iteratively Reweighted Least Squares

Arxiv

0+阅读 · 2023年6月8日

Robust Non-Linear Feedback Coding via Power-Constrained Deep Learning

Arxiv

0+阅读 · 2023年6月7日

Misspecification Analysis of High-Dimensional Random Effects Models for Estimation of Signal-to-Noise Ratios

Arxiv

0+阅读 · 2023年6月7日

Smooth Non-Stationary Bandits

Arxiv

0+阅读 · 2023年6月7日

Revisiting Weighted Strategy for Non-stationary Parametric Bandits

Arxiv

0+阅读 · 2023年6月7日

On the Fundamental Tradeoff of Integrated Sensing and Communications Under Gaussian Channels

Arxiv

0+阅读 · 2023年6月7日

Complexity of a Class of First-Order Objective-Function-Free Optimization Algorithms

Arxiv

0+阅读 · 2023年6月6日

相关基金

肺炎支原体外排泵ABC Transporter在大环内酯类耐药中的作用机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

单个中性原子的操控与精密测量

国家自然科学基金

0+阅读 · 2013年12月31日

S = 1/2的J1-J2阻挫自旋链材料的基态和量子相变研究

国家自然科学基金

0+阅读 · 2013年12月31日

两类Monge-Ampere方程问题的研究

国家自然科学基金

1+阅读 · 2012年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

1+阅读 · 2011年12月31日

转录因子AP-2α22312;UVB诱发皮肤癌中的作用和机制研究

国家自然科学基金

0+阅读 · 2010年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

双亲性共聚物自组装表面活性胶体粒子及其乳化性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

拟南芥VSP蛋白的晶体结构和催化特性研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于可加模糊行为的轮式机器人运动规划与控制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员