Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits - 专知论文

会员服务 ·

0

PBT · 超参数 · 赌博机/老虎机 · Performer · 优化器 ·

2021 年 2 月 22 日

Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits

翻译：利用以人口为主的强盗优化在线超光度计

Jack Parker-Holder,Vu Nguyen,Stephen Roberts

from arxiv, Camera-ready version, NeurIPS 2020

Many of the recent triumphs in machine learning are dependent on well-tuned hyperparameters. This is particularly prominent in reinforcement learning (RL) where a small change in the configuration can lead to failure. Despite the importance of tuning hyperparameters, it remains expensive and is often done in a naive and laborious way. A recent solution to this problem is Population Based Training (PBT) which updates both weights and hyperparameters in a single training run of a population of agents. PBT has been shown to be particularly effective in RL, leading to widespread use in the field. However, PBT lacks theoretical guarantees since it relies on random heuristics to explore the hyperparameter space. This inefficiency means it typically requires vast computational resources, which is prohibitive for many small and medium sized labs. In this work, we introduce the first provably efficient PBT-style algorithm, Population-Based Bandits (PB2). PB2 uses a probabilistic model to guide the search in an efficient way, making it possible to discover high performing hyperparameter configurations with far fewer agents than typically required by PBT. We show in a series of RL experiments that PB2 is able to achieve high performance with a modest computational budget.

翻译：近来在机器学习方面的许多成功都依赖于经过良好调整的超参数。这在强化学习(RL)中特别突出,因为对配置进行小的改变可能导致失败。尽管调整超参数很重要,但费用仍然昂贵,而且往往以幼稚和艰苦的方式完成。这个问题的最近解决办法是人口培训(PBT),它更新了一个代理人员单一培训的重量和超参数。PBT已证明在RL中特别有效,导致外地的广泛使用。然而,PBT缺乏理论保障,因为它依靠随机的超参数空间探索。这种效率低通常意味着它需要大量的计算资源,而许多中小型实验室则无法使用这种资源。在这项工作中,我们采用了第一个效率很高的PBT型算法(PPB2)。 PB2使用一种概率模型来有效指导搜索,从而能够发现高性超参数配置,而比通常高的PB2级测试要低得多。我们用一个普通的PB2级测试来显示一个普通的PLT。

0

相关内容

PBT

【南京大学】量子计算 (Spring 2021)课程

专知会员服务

59+阅读 · 2021年4月12日

DARPA可解释人工智能

DARPA可解释人工智能

专知会员服务

135+阅读 · 2020年12月22日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

124+阅读 · 2020年5月30日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

86+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

专知会员服务

20+阅读 · 2019年12月24日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

已删除

将门创投

4+阅读 · 2018年6月26日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Rule-Based Reinforcement Learning for Efficient Robot Navigation with Space Reduction

Arxiv

0+阅读 · 2021年4月15日

Safe Continuous Control with Constrained Model-Based Policy Optimization

Arxiv

0+阅读 · 2021年4月14日

ABEM: An Adaptive Agent-based Evolutionary Approach for Mining Influencers in Online Social Networks

Arxiv

0+阅读 · 2021年4月14日

Blending MPC & Value Function Approximation for Efficient Reinforcement Learning

Arxiv

0+阅读 · 2021年4月13日

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

Arxiv

0+阅读 · 2021年4月13日

The Sample Complexity of Up-to-$\varepsilon$ Multi-Dimensional Revenue Maximization

Arxiv

0+阅读 · 2021年4月9日

The Menu-Size Complexity of Revenue Approximation

Arxiv

0+阅读 · 2021年4月9日

Population network structure impacts genetic algorithm optimisation performance

Arxiv

0+阅读 · 2021年4月9日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

VIP会员

文章信息

相关主题

赌博机/老虎机

最新内容

博士论文 | 面向大模型推理的内存高效算法

博士论文 | 面向大模型推理的内存高效算法

专知会员服务

0+阅读 · 今天15:20

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

0+阅读 · 今天15:18

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

8+阅读 · 今天5:53

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

4+阅读 · 今天5:45

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

专知会员服务

2+阅读 · 今天5:23

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

专知会员服务

2+阅读 · 今天5:11

《防空交战流程的概率建模研究》

《防空交战流程的概率建模研究》

专知会员服务

6+阅读 · 今天5:04

ICML 2026 教程 | 数值优化理论还重要吗？

ICML 2026 教程 | 数值优化理论还重要吗？

专知会员服务

4+阅读 · 7月26日

ICM 2026 | 陶哲轩：人工智能时代的数学

ICM 2026 | 陶哲轩：人工智能时代的数学

专知会员服务

8+阅读 · 7月26日

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

专知会员服务

8+阅读 · 7月26日

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

专知会员服务

10+阅读 · 7月26日

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

专知会员服务

8+阅读 · 7月26日

《反无人机交战场景下的战斗归零研究》

《反无人机交战场景下的战斗归零研究》

专知会员服务

7+阅读 · 7月26日

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

专知会员服务

4+阅读 · 7月26日

博士论文 | 用代码结构感知方法推进代码大模型

博士论文 | 用代码结构感知方法推进代码大模型

专知会员服务

5+阅读 · 7月25日

相关VIP内容

【南京大学】量子计算 (Spring 2021)课程

专知会员服务

59+阅读 · 2021年4月12日

DARPA可解释人工智能

DARPA可解释人工智能

专知会员服务

135+阅读 · 2020年12月22日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

124+阅读 · 2020年5月30日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

86+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

专知会员服务

20+阅读 · 2019年12月24日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

美空军新型反无人机部队初探

博士论文 | 面向大模型推理的内存高效算法

《无人系统互操作性导论——无人系统联合架构（JAUS）》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

已删除

将门创投

4+阅读 · 2018年6月26日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Rule-Based Reinforcement Learning for Efficient Robot Navigation with Space Reduction

Arxiv

0+阅读 · 2021年4月15日

Safe Continuous Control with Constrained Model-Based Policy Optimization

Arxiv

0+阅读 · 2021年4月14日

ABEM: An Adaptive Agent-based Evolutionary Approach for Mining Influencers in Online Social Networks

Arxiv

0+阅读 · 2021年4月14日

Blending MPC & Value Function Approximation for Efficient Reinforcement Learning

Arxiv

0+阅读 · 2021年4月13日

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

Arxiv

0+阅读 · 2021年4月13日

The Sample Complexity of Up-to-$\varepsilon$ Multi-Dimensional Revenue Maximization

Arxiv

0+阅读 · 2021年4月9日

The Menu-Size Complexity of Revenue Approximation

Arxiv

0+阅读 · 2021年4月9日

Population network structure impacts genetic algorithm optimisation performance

Arxiv

0+阅读 · 2021年4月9日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

微信扫码咨询专知VIP会员