No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation - 专知论文

会员服务 ·

0

分离的 · Learning · 噪声 · 学习率 · Continuity ·

2023 年 3 月 17 日

No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation

翻译：噪声反馈博弈中的无遗憾学习：通过学习率分离实现更快速率与自适应

Yu-Guan Hsieh,Kimon Antonakopoulos,Volkan Cevher,Panayotis Mertikopoulos

from arxiv, In Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

We examine the problem of regret minimization when the learner is involved in a continuous game with other optimizing agents: in this case, if all players follow a no-regret algorithm, it is possible to achieve significantly lower regret relative to fully adversarial environments. We study this problem in the context of variationally stable games (a class of continuous games which includes all convex-concave and monotone games), and when the players only have access to noisy estimates of their individual payoff gradients. If the noise is additive, the game-theoretic and purely adversarial settings enjoy similar regret guarantees; however, if the noise is multiplicative, we show that the learners can, in fact, achieve constant regret. We achieve this faster rate via an optimistic gradient scheme with learning rate separation -- that is, the method's extrapolation and update steps are tuned to different schedules, depending on the noise profile. Subsequently, to eliminate the need for delicate hyperparameter tuning, we propose a fully adaptive method that attains nearly the same guarantees as its non-adapted counterpart, while operating without knowledge of either the game or of the noise profile.

翻译：我们研究了当学习者参与与其他优化主体的连续博弈时的遗憾最小化问题：在此情境下，若所有玩家均采用无遗憾算法，相较于完全对抗环境，可获得显著更低的遗憾值。我们针对变分稳定博弈（包含所有凸-凹博弈与单调博弈的一类连续博弈），且玩家仅能获取其个体收益梯度的噪声估计时进行该问题的研究。若噪声为加性噪声，博弈论设置与纯对抗设置享有相近的遗憾保证；然而，当噪声为乘性噪声时，我们证明学习者实际上可实现常值遗憾。我们通过采用学习率分离的乐观梯度方案达成这一更快速率——即该方法的外推步长与更新步长根据噪声特性按不同调度进行调节。进而，为解决精细超参数调优需求，我们提出一种全自适应方法，该方法无需预知博弈类型或噪声特性，即可实现与其非自适应版本几乎相同的性能保证。

0

相关内容

分离的

《分布式多智能体深度强化学习：竞争性博弈》最新论文

《分布式多智能体深度强化学习：竞争性博弈》最新论文

专知会员服务

131+阅读 · 2023年3月16日

《计算和学习博弈》美国空军、加州理工15页项目总结报告

《计算和学习博弈》美国空军、加州理工15页项目总结报告

专知会员服务

42+阅读 · 2022年10月3日

【ICLR 2022】《多Agent控制的遗憾最小化方法》谷歌、普林斯顿大学

【ICLR 2022】《多Agent控制的遗憾最小化方法》谷歌、普林斯顿大学

专知会员服务

19+阅读 · 2022年6月16日

【ICML2022】可达性约束强化学习

【ICML2022】可达性约束强化学习

专知会员服务

23+阅读 · 2022年5月18日

【ToG 2021】强化学习中图像局部区域敏感的探索奖励，Deep Reinforcement Learning with Part-aware Exploration Bonus in Video Games

【ToG 2021】强化学习中图像局部区域敏感的探索奖励，Deep Reinforcement Learning with Part-aware Exploration Bonus in Video Games

专知会员服务

16+阅读 · 2022年3月29日

Google 发布82页《深度学习泛化性揭秘》综述论文，On the Generalization Mystery in Deep Learning

Google 发布82页《深度学习泛化性揭秘》综述论文，On the Generalization Mystery in Deep Learning

专知会员服务

61+阅读 · 2022年3月22日

【NeurIPS 2021】设置多智能体策略梯度的方差

【NeurIPS 2021】设置多智能体策略梯度的方差

专知会员服务

21+阅读 · 2021年10月24日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【Google-普林斯顿】从学习速率中解开自适应梯度法，Disentangling Adaptive Gradient

专知会员服务

19+阅读 · 2020年3月5日

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

专知会员服务

28+阅读 · 2020年2月18日

《基于近端策略优化(PPO)算法的制导弹体控制行为学习》美国陆军2022最新27页技术报告

《基于近端策略优化(PPO)算法的制导弹体控制行为学习》美国陆军2022最新27页技术报告

专知

13+阅读 · 2022年11月25日

【CMU博士论文】黑盒和多目标优化策略，151页pdf

【CMU博士论文】黑盒和多目标优化策略，151页pdf

专知

13+阅读 · 2022年11月24日

ECCV 2022 | AirDet: 无需微调的小样本目标检测方法

ECCV 2022 | AirDet: 无需微调的小样本目标检测方法

极市平台

0+阅读 · 2022年7月30日

《用于控制、探索和安全的样本高效深度强化学习》里尔大学207页博士论文

《用于控制、探索和安全的样本高效深度强化学习》里尔大学207页博士论文

专知

5+阅读 · 2022年7月27日

ICML 2022 | Meta提出鲁棒的多目标贝叶斯优化方法，有效应对输入噪声

ICML 2022 | Meta提出鲁棒的多目标贝叶斯优化方法，有效应对输入噪声

PaperWeekly

0+阅读 · 2022年7月4日

对抗训练理论分析：自适应步长快速对抗训练

对抗训练理论分析：自适应步长快速对抗训练

PaperWeekly

2+阅读 · 2022年6月23日

RL解决'LunarLander-v2' (SOTA)

RL解决'LunarLander-v2' (SOTA)

CreateAMind

62+阅读 · 2019年9月27日

TensorFlow 2.0深度强化学习指南

TensorFlow 2.0深度强化学习指南

云栖社区

18+阅读 · 2019年2月1日

机器学习常见模式LogSumExp解密

机器学习常见模式LogSumExp解密

论智

21+阅读 · 2018年10月30日

如何找到最优学习率？

如何找到最优学习率？

AI研习社

11+阅读 · 2017年11月29日

随机动力系统的逼近和跑出问题

国家自然科学基金

0+阅读 · 2015年12月31日

几类随机种群模型的几乎必然持久性研究

国家自然科学基金

0+阅读 · 2015年12月31日

随机信息下的一些函数恢复问题

国家自然科学基金

0+阅读 · 2013年12月31日

连续时间马氏决策过程均值-方差优化问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

环糊精/室温离子液体在环境友好分离/光谱分析中的应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

受限制策略下多臂Bandit过程的理论与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

分布估计学习关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

时间延迟偏微分控制系统镇定问题的研究

国家自然科学基金

1+阅读 · 2012年12月31日

观测含时间延迟的偏微分控制系统的输出反馈镇定

国家自然科学基金

0+阅读 · 2011年12月31日

委托代理问题的一类优化方法和算法设计研究

国家自然科学基金

0+阅读 · 2009年12月31日

Adaptive Localized Reduced Basis Methods for Large Scale Parameterized Systems

Arxiv

0+阅读 · 2023年5月9日

$\texttt{BanditQ}:$ Fair Multi-Armed Bandits with Guaranteed Rewards per Arm

Arxiv

0+阅读 · 2023年5月9日

Improving Adversarial Transferability via Intermediate-level Perturbation Decay

Arxiv

0+阅读 · 2023年5月9日

Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video

Arxiv

0+阅读 · 2023年5月8日

Repeated Principal-Agent Games with Unobserved Agent Rewards and Perfect-Knowledge Agents

Arxiv

0+阅读 · 2023年5月7日

An adaptive ANOVA stochastic Galerkin method for partial differential equations with random inputs

Arxiv

0+阅读 · 2023年5月6日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

Sequence Level Contrastive Learning for Text Summarization

Sequence Level Contrastive Learning for Text Summarization

Arxiv

14+阅读 · 2021年9月24日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

专知会员服务

0+阅读 · 今天15:55

GNN跨域综述：从消息传递到图基础模型

GNN跨域综述：从消息传递到图基础模型

专知会员服务

0+阅读 · 今天15:53

无人机自主控制与人工智能：系统性综述

无人机自主控制与人工智能：系统性综述

专知会员服务

11+阅读 · 今天7:25

巡飞弹与反无人机系统——现代战场的两大支柱

巡飞弹与反无人机系统——现代战场的两大支柱

专知会员服务

3+阅读 · 今天6:54

《打造“黄金舰队”》57页报告

《打造“黄金舰队”》57页报告

专知会员服务

3+阅读 · 今天6:52

《北约数字教官网络发展路径》128页报告

《北约数字教官网络发展路径》128页报告

专知会员服务

2+阅读 · 今天6:33

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

7+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

6+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

10+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

8+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

8+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

8+阅读 · 6月25日

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

10+阅读 · 6月25日

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

9+阅读 · 6月25日

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

9+阅读 · 6月25日

相关VIP内容

《分布式多智能体深度强化学习：竞争性博弈》最新论文

《分布式多智能体深度强化学习：竞争性博弈》最新论文

专知会员服务

131+阅读 · 2023年3月16日

《计算和学习博弈》美国空军、加州理工15页项目总结报告

《计算和学习博弈》美国空军、加州理工15页项目总结报告

专知会员服务

42+阅读 · 2022年10月3日

【ICLR 2022】《多Agent控制的遗憾最小化方法》谷歌、普林斯顿大学

【ICLR 2022】《多Agent控制的遗憾最小化方法》谷歌、普林斯顿大学

专知会员服务

19+阅读 · 2022年6月16日

【ICML2022】可达性约束强化学习

【ICML2022】可达性约束强化学习

专知会员服务

23+阅读 · 2022年5月18日

【ToG 2021】强化学习中图像局部区域敏感的探索奖励，Deep Reinforcement Learning with Part-aware Exploration Bonus in Video Games

【ToG 2021】强化学习中图像局部区域敏感的探索奖励，Deep Reinforcement Learning with Part-aware Exploration Bonus in Video Games

专知会员服务

16+阅读 · 2022年3月29日

Google 发布82页《深度学习泛化性揭秘》综述论文，On the Generalization Mystery in Deep Learning

Google 发布82页《深度学习泛化性揭秘》综述论文，On the Generalization Mystery in Deep Learning

专知会员服务

61+阅读 · 2022年3月22日

【NeurIPS 2021】设置多智能体策略梯度的方差

【NeurIPS 2021】设置多智能体策略梯度的方差

专知会员服务

21+阅读 · 2021年10月24日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【Google-普林斯顿】从学习速率中解开自适应梯度法，Disentangling Adaptive Gradient

专知会员服务

19+阅读 · 2020年3月5日

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

专知会员服务

28+阅读 · 2020年2月18日

热门VIP内容

开通专知VIP会员享更多权益服务

GNN跨域综述：从消息传递到图基础模型

巡飞弹与反无人机系统——现代战场的两大支柱

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

无人机自主控制与人工智能：系统性综述

相关资讯

《基于近端策略优化(PPO)算法的制导弹体控制行为学习》美国陆军2022最新27页技术报告

《基于近端策略优化(PPO)算法的制导弹体控制行为学习》美国陆军2022最新27页技术报告

专知

13+阅读 · 2022年11月25日

【CMU博士论文】黑盒和多目标优化策略，151页pdf

【CMU博士论文】黑盒和多目标优化策略，151页pdf

专知

13+阅读 · 2022年11月24日

ECCV 2022 | AirDet: 无需微调的小样本目标检测方法

ECCV 2022 | AirDet: 无需微调的小样本目标检测方法

极市平台

0+阅读 · 2022年7月30日

《用于控制、探索和安全的样本高效深度强化学习》里尔大学207页博士论文

《用于控制、探索和安全的样本高效深度强化学习》里尔大学207页博士论文

专知

5+阅读 · 2022年7月27日

ICML 2022 | Meta提出鲁棒的多目标贝叶斯优化方法，有效应对输入噪声

ICML 2022 | Meta提出鲁棒的多目标贝叶斯优化方法，有效应对输入噪声

PaperWeekly

0+阅读 · 2022年7月4日

对抗训练理论分析：自适应步长快速对抗训练

对抗训练理论分析：自适应步长快速对抗训练

PaperWeekly

2+阅读 · 2022年6月23日

RL解决'LunarLander-v2' (SOTA)

RL解决'LunarLander-v2' (SOTA)

CreateAMind

62+阅读 · 2019年9月27日

TensorFlow 2.0深度强化学习指南

TensorFlow 2.0深度强化学习指南

云栖社区

18+阅读 · 2019年2月1日

机器学习常见模式LogSumExp解密

机器学习常见模式LogSumExp解密

论智

21+阅读 · 2018年10月30日

如何找到最优学习率？

如何找到最优学习率？

AI研习社

11+阅读 · 2017年11月29日

相关论文

Adaptive Localized Reduced Basis Methods for Large Scale Parameterized Systems

Arxiv

0+阅读 · 2023年5月9日

$\texttt{BanditQ}:$ Fair Multi-Armed Bandits with Guaranteed Rewards per Arm

Arxiv

0+阅读 · 2023年5月9日

Improving Adversarial Transferability via Intermediate-level Perturbation Decay

Arxiv

0+阅读 · 2023年5月9日

Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video

Arxiv

0+阅读 · 2023年5月8日

Repeated Principal-Agent Games with Unobserved Agent Rewards and Perfect-Knowledge Agents

Arxiv

0+阅读 · 2023年5月7日

An adaptive ANOVA stochastic Galerkin method for partial differential equations with random inputs

Arxiv

0+阅读 · 2023年5月6日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

Sequence Level Contrastive Learning for Text Summarization

Sequence Level Contrastive Learning for Text Summarization

Arxiv

14+阅读 · 2021年9月24日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

相关基金

随机动力系统的逼近和跑出问题

国家自然科学基金

0+阅读 · 2015年12月31日

几类随机种群模型的几乎必然持久性研究

国家自然科学基金

0+阅读 · 2015年12月31日

随机信息下的一些函数恢复问题

国家自然科学基金

0+阅读 · 2013年12月31日

连续时间马氏决策过程均值-方差优化问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

环糊精/室温离子液体在环境友好分离/光谱分析中的应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

受限制策略下多臂Bandit过程的理论与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

分布估计学习关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

时间延迟偏微分控制系统镇定问题的研究

国家自然科学基金

1+阅读 · 2012年12月31日

观测含时间延迟的偏微分控制系统的输出反馈镇定

国家自然科学基金

0+阅读 · 2011年12月31日

委托代理问题的一类优化方法和算法设计研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员