Data-Driven Regret Balancing for Online Model Selection in Bandits - 专知论文

会员服务 ·

0

模型选择 · 基学习器 · 赌博机/老虎机 · 基 · 学习器 ·

2023 年 6 月 5 日

Data-Driven Regret Balancing for Online Model Selection in Bandits

翻译：数据驱动的遗憾平衡方法用于Bandits中的在线模型选择

Aldo Pacchiano,Christoph Dann,Claudio Gentile

We consider model selection for sequential decision making in stochastic environments with bandit feedback, where a meta-learner has at its disposal a pool of base learners, and decides on the fly which action to take based on the policies recommended by each base learner. Model selection is performed by regret balancing but, unlike the recent literature on this subject, we do not assume any prior knowledge about the base learners like candidate regret guarantees; instead, we uncover these quantities in a data-driven manner. The meta-learner is therefore able to leverage the realized regret incurred by each base learner for the learning environment at hand (as opposed to the expected regret), and single out the best such regret. We design two model selection algorithms operating with this more ambitious notion of regret and, besides proving model selection guarantees via regret balancing, we experimentally demonstrate the compelling practical benefits of dealing with actual regrets instead of candidate regret bounds.

翻译：我们考虑在随机环境中具有Bandit反馈的序列决策中的模型选择问题，其中元学习器拥有一个基础学习器池，并根据每个基础学习器推荐的策略实时决定采取何种行动。模型选择通过遗憾平衡方法实现，但与近期相关文献不同，我们无需假设任何关于基础学习器的先验知识（如候选遗憾保证），而是以数据驱动的方式揭示这些量。因此，元学习器能够利用每个基础学习器在当前学习环境中产生的实际遗憾（而非期望遗憾），并从中选出最佳遗憾值。我们设计了两种基于这一更具挑战性的遗憾概念的模型选择算法，除了通过遗憾平衡证明模型选择保证外，还通过实验证明了处理实际遗憾而非候选遗憾界所带来的显著实际优势。

0

相关内容

模型选择

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

73+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

多离合器ISG混合动力汽车分层多模式切换协调控制与优化

国家自然科学基金

1+阅读 · 2014年12月31日

马铃薯磷转运基因家族1的克隆及其功能分析

国家自然科学基金

0+阅读 · 2014年12月31日

Affordance辅助服务机器人识别形状不规则物体研究

国家自然科学基金

0+阅读 · 2013年12月31日

钢中奥氏体稳态动态再结晶行为的研究

国家自然科学基金

0+阅读 · 2013年12月31日

自适应多分辨率宽带频谱压缩感知

国家自然科学基金

0+阅读 · 2012年12月31日

高原天然染料敏化剂的提纯、改性和共敏化的研究

国家自然科学基金

0+阅读 · 2012年12月31日

马铃薯野生种（Solanum pinnatisectum）块茎变紫的光谱效应及其分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

环境友好型高取向织构化铁电压电陶瓷的制备及机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

Multi-Model Subset Selection

Arxiv

0+阅读 · 2023年7月26日

On the Learning Dynamics of Attention Networks

Arxiv

0+阅读 · 2023年7月26日

Anytime Model Selection in Linear Bandits

Arxiv

0+阅读 · 2023年7月24日

Trade-off between predictive performance and FDR control for high-dimensional Gaussian model selection

Arxiv

0+阅读 · 2023年7月24日

Adaptive debiased machine learning using data-driven model selection techniques

Arxiv

0+阅读 · 2023年7月24日

Scalable solution to crossed random effects model with random slopes

Arxiv

0+阅读 · 2023年7月23日

Advancing Ad Auction Realism: Practical Insights & Modeling Implications

Arxiv

0+阅读 · 2023年7月21日

Multiple bias-calibration for adjusting selection bias of non-probability samples using data integration

Arxiv

0+阅读 · 2023年7月21日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

A Survey on Causal Inference

Arxiv

113+阅读 · 2020年2月5日

VIP会员

文章信息

相关主题

赌博机/老虎机

最新内容

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

专知会员服务

4+阅读 · 7月17日

《边缘端实时无线感知赋能现场多机器人部署》200页

《边缘端实时无线感知赋能现场多机器人部署》200页

专知会员服务

5+阅读 · 7月17日

战力倍增器：自主武器系统与乌克兰及加沙冲突

战力倍增器：自主武器系统与乌克兰及加沙冲突

专知会员服务

4+阅读 · 7月17日

人工智能赋能战场情报：提速决策进程

人工智能赋能战场情报：提速决策进程

专知会员服务

2+阅读 · 7月17日

《拥抱新兴技术：面向未来军官的教育革新》

《拥抱新兴技术：面向未来军官的教育革新》

专知会员服务

5+阅读 · 7月17日

ACM MM 2026 | MAR-GRPO：稳定混合图像生成的强化学习训练

ACM MM 2026 | MAR-GRPO：稳定混合图像生成的强化学习训练

专知会员服务

2+阅读 · 7月17日

综述 | 大模型水印理论与部署：来源追踪、攻击鲁棒与可信治理

综述 | 大模型水印理论与部署：来源追踪、攻击鲁棒与可信治理

专知会员服务

3+阅读 · 7月17日

《火线上的后勤保障：对抗环境下的随机规划模型研究——俄乌场景案例分析》99页

《火线上的后勤保障：对抗环境下的随机规划模型研究——俄乌场景案例分析》99页

专知会员服务

11+阅读 · 7月16日

《无人地面战车（UGV）的崛起》报告

《无人地面战车（UGV）的崛起》报告

专知会员服务

7+阅读 · 7月16日

《无人机参数化与集群飞行创新项目的监控流程管理：模型、策略及自适应解决方案》

《无人机参数化与集群飞行创新项目的监控流程管理：模型、策略及自适应解决方案》

专知会员服务

6+阅读 · 7月16日

《美军开放式任务系统（OMS）定义与文档（D&D）——Java关键抽象层（CAL）接口生成规范》47页标准

《美军开放式任务系统（OMS）定义与文档（D&D）——Java关键抽象层（CAL）接口生成规范》47页标准

专知会员服务

13+阅读 · 7月16日

美陆军任务式指挥人工智能解决方案

美陆军任务式指挥人工智能解决方案

专知会员服务

13+阅读 · 7月16日

ICML 2026 | 理论级自动形式化：从孤立命题到统一形式化知识库

ICML 2026 | 理论级自动形式化：从孤立命题到统一形式化知识库

专知会员服务

9+阅读 · 7月16日

综述 | 现代智能体自我改进，从模型更新到脚手架演化

综述 | 现代智能体自我改进，从模型更新到脚手架演化

专知会员服务

16+阅读 · 7月16日

美国陆军宣布“项目融合-顶点6”：现代化进程的关键里程碑

美国陆军宣布“项目融合-顶点6”：现代化进程的关键里程碑

专知会员服务

13+阅读 · 7月15日

相关VIP内容

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

73+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《边缘端实时无线感知赋能现场多机器人部署》200页

人工智能赋能战场情报：提速决策进程

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

战力倍增器：自主武器系统与乌克兰及加沙冲突

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Multi-Model Subset Selection

Arxiv

0+阅读 · 2023年7月26日

On the Learning Dynamics of Attention Networks

Arxiv

0+阅读 · 2023年7月26日

Anytime Model Selection in Linear Bandits

Arxiv

0+阅读 · 2023年7月24日

Trade-off between predictive performance and FDR control for high-dimensional Gaussian model selection

Arxiv

0+阅读 · 2023年7月24日

Adaptive debiased machine learning using data-driven model selection techniques

Arxiv

0+阅读 · 2023年7月24日

Scalable solution to crossed random effects model with random slopes

Arxiv

0+阅读 · 2023年7月23日

Advancing Ad Auction Realism: Practical Insights & Modeling Implications

Arxiv

0+阅读 · 2023年7月21日

Multiple bias-calibration for adjusting selection bias of non-probability samples using data integration

Arxiv

0+阅读 · 2023年7月21日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

A Survey on Causal Inference

Arxiv

113+阅读 · 2020年2月5日

相关基金

多离合器ISG混合动力汽车分层多模式切换协调控制与优化

国家自然科学基金

1+阅读 · 2014年12月31日

马铃薯磷转运基因家族1的克隆及其功能分析

国家自然科学基金

0+阅读 · 2014年12月31日

Affordance辅助服务机器人识别形状不规则物体研究

国家自然科学基金

0+阅读 · 2013年12月31日

钢中奥氏体稳态动态再结晶行为的研究

国家自然科学基金

0+阅读 · 2013年12月31日

自适应多分辨率宽带频谱压缩感知

国家自然科学基金

0+阅读 · 2012年12月31日

高原天然染料敏化剂的提纯、改性和共敏化的研究

国家自然科学基金

0+阅读 · 2012年12月31日

马铃薯野生种（Solanum pinnatisectum）块茎变紫的光谱效应及其分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

环境友好型高取向织构化铁电压电陶瓷的制备及机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员