Supplementing Gradient-Based Reinforcement Learning with Simple Evolutionary Ideas - 专知论文

会员服务 ·

0

Agent · SimPLe · Learning · 强化学习 · 协方差矩阵 ·

2023 年 5 月 10 日

Supplementing Gradient-Based Reinforcement Learning with Simple Evolutionary Ideas

翻译：基于梯度的强化学习结合简单进化思想的补充方法

Harshad Khadilkar

from arxiv, 17 pages

We present a simple, sample-efficient algorithm for introducing large but directed learning steps in reinforcement learning (RL), through the use of evolutionary operators. The methodology uses a population of RL agents training with a common experience buffer, with occasional crossovers and mutations of the agents in order to search efficiently through the policy space. Unlike prior literature on combining evolutionary search (ES) with RL, this work does not generate a distribution of agents from a common mean and covariance matrix. Neither does it require the evaluation of the entire population of policies at every time step. Instead, we focus on gradient-based training throughout the life of every policy (individual), with a sparse amount of evolutionary exploration. The resulting algorithm is shown to be robust to hyperparameter variations. As a surprising corollary, we show that simply initialising and training multiple RL agents with a common memory (with no further evolutionary updates) outperforms several standard RL baselines.

翻译：我们提出了一种简单且样本高效的算法，通过引入进化操作器在强化学习（RL）中实现大而定向的学习步骤。该方法使用一个共享经验缓冲区的RL智能体种群进行训练，并偶尔对智能体进行交叉与变异，以在策略空间中高效搜索。与先前将进化搜索（ES）与RL结合的文献不同，本研究既不需要从共同均值与协方差矩阵生成智能体分布，也无需在每个时间步评估整个策略种群。相反，我们聚焦于每个策略（个体）整个生命周期中的基于梯度的训练，同时辅以稀疏的进化探索。实验表明，所得算法对超参数变化具有鲁棒性。作为一个令人惊讶的推论，我们证明：仅通过共享记忆初始化并训练多个RL智能体（无需进一步进化更新）即可超越多个标准RL基线方法。

0

相关内容

Agent

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

98+阅读 · 2019年12月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

ARB抑制miR-193a表达促进早期糖尿病肾病壁层上皮细胞-足细胞转分化研究

国家自然科学基金

0+阅读 · 2015年12月31日

miR-155对LPS诱导的角膜炎症免疫反应的调控作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

新疆棉花高效利用土壤磷的根际和菌丝际过程与调控

国家自然科学基金

0+阅读 · 2014年12月31日

微孢子虫感染诱导的家蚕细胞凋亡抑制及其Serpins在抑制过程中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

沉默ARK5基因逆转乏氧诱导胃癌多药耐药的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

miR-491通过调控T细胞的增殖和凋亡在诱导T细胞衰竭中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

e-learning中基于学业表情的情绪认知分析研究

国家自然科学基金

0+阅读 · 2009年12月31日

miR-126对CD4+CD25+调节性T细胞外周诱导的作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

Cherry-Picking with Reinforcement Learning : Robust Dynamic Grasping in Unstable Conditions

Arxiv

0+阅读 · 2023年6月28日

What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Arxiv

0+阅读 · 2023年6月28日

Replicable Reinforcement Learning

Arxiv

0+阅读 · 2023年6月27日

Automatic Truss Design with Reinforcement Learning

Arxiv

0+阅读 · 2023年6月27日

Statistical Component Separation for Targeted Signal Recovery in Noisy Mixtures

Arxiv

0+阅读 · 2023年6月26日

Supervised Pretraining Can Learn In-Context Reinforcement Learning

Arxiv

1+阅读 · 2023年6月26日

Safety-Critical Scenario Generation Via Reinforcement Learning Based Editing

Arxiv

0+阅读 · 2023年6月25日

Active Coverage for PAC Reinforcement Learning

Arxiv

0+阅读 · 2023年6月23日

Reinforcement Learning-based Virtual Fixtures for Teleoperation of Hydraulic Construction Machine

Reinforcement Learning-based Virtual Fixtures for Teleoperation of Hydraulic Construction Machine

Arxiv

0+阅读 · 2023年6月23日

Reinforcement Learning based Air Combat Maneuver Generation

Reinforcement Learning based Air Combat Maneuver Generation

Arxiv

92+阅读 · 2022年1月14日

VIP会员

文章信息

相关主题

协方差矩阵

最新内容

无人机自主控制与人工智能：系统性综述

无人机自主控制与人工智能：系统性综述

专知会员服务

10+阅读 · 今天7:25

巡飞弹与反无人机系统——现代战场的两大支柱

巡飞弹与反无人机系统——现代战场的两大支柱

专知会员服务

3+阅读 · 今天6:54

《打造“黄金舰队”》57页报告

《打造“黄金舰队”》57页报告

专知会员服务

3+阅读 · 今天6:52

《北约数字教官网络发展路径》128页报告

《北约数字教官网络发展路径》128页报告

专知会员服务

2+阅读 · 今天6:33

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

7+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

6+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

10+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

8+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

8+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

8+阅读 · 6月25日

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

10+阅读 · 6月25日

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

9+阅读 · 6月25日

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

9+阅读 · 6月25日

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

10+阅读 · 6月24日

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

10+阅读 · 6月24日

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

98+阅读 · 2019年12月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

巡飞弹与反无人机系统——现代战场的两大支柱

《北约数字教官网络发展路径》128页报告

无人机自主控制与人工智能：系统性综述

《打造“黄金舰队”》57页报告

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Cherry-Picking with Reinforcement Learning : Robust Dynamic Grasping in Unstable Conditions

Arxiv

0+阅读 · 2023年6月28日

What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Arxiv

0+阅读 · 2023年6月28日

Replicable Reinforcement Learning

Arxiv

0+阅读 · 2023年6月27日

Automatic Truss Design with Reinforcement Learning

Arxiv

0+阅读 · 2023年6月27日

Statistical Component Separation for Targeted Signal Recovery in Noisy Mixtures

Arxiv

0+阅读 · 2023年6月26日

Supervised Pretraining Can Learn In-Context Reinforcement Learning

Arxiv

1+阅读 · 2023年6月26日

Safety-Critical Scenario Generation Via Reinforcement Learning Based Editing

Arxiv

0+阅读 · 2023年6月25日

Active Coverage for PAC Reinforcement Learning

Arxiv

0+阅读 · 2023年6月23日

Reinforcement Learning-based Virtual Fixtures for Teleoperation of Hydraulic Construction Machine

Reinforcement Learning-based Virtual Fixtures for Teleoperation of Hydraulic Construction Machine

Arxiv

0+阅读 · 2023年6月23日

Reinforcement Learning based Air Combat Maneuver Generation

Reinforcement Learning based Air Combat Maneuver Generation

Arxiv

92+阅读 · 2022年1月14日

相关基金

ARB抑制miR-193a表达促进早期糖尿病肾病壁层上皮细胞-足细胞转分化研究

国家自然科学基金

0+阅读 · 2015年12月31日

miR-155对LPS诱导的角膜炎症免疫反应的调控作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

新疆棉花高效利用土壤磷的根际和菌丝际过程与调控

国家自然科学基金

0+阅读 · 2014年12月31日

微孢子虫感染诱导的家蚕细胞凋亡抑制及其Serpins在抑制过程中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

沉默ARK5基因逆转乏氧诱导胃癌多药耐药的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

miR-491通过调控T细胞的增殖和凋亡在诱导T细胞衰竭中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

e-learning中基于学业表情的情绪认知分析研究

国家自然科学基金

0+阅读 · 2009年12月31日

miR-126对CD4+CD25+调节性T细胞外周诱导的作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员