Learning to Generate All Feasible Actions - 专知论文

会员服务 ·

0

可行 · 优化器 · Learning · 可约的 · INTERACT ·

2023 年 1 月 26 日

Learning to Generate All Feasible Actions

翻译：学习生成所有可行动作

Mirco Theile,Daniele Bernardini,Raphael Trumpp,Cristina Piazza,Marco Caccamo,Alberto L. Sangiovanni-Vincentelli

Several machine learning (ML) applications are characterized by searching for an optimal solution to a complex task. The search space for this optimal solution is often very large, so large in fact that this optimal solution is often not computable. Part of the problem is that many candidate solutions found via ML are actually infeasible and have to be discarded. Restricting the search space to only the feasible solution candidates simplifies finding an optimal solution for the tasks. Further, the set of feasible solutions could be re-used in multiple problems characterized by different tasks. In particular, we observe that complex tasks can be decomposed into subtasks and corresponding skills. We propose to learn a reusable and transferable skill by training an actor to generate all feasible actions. The trained actor can then propose feasible actions, among which an optimal one can be chosen according to a specific task. The actor is trained by interpreting the feasibility of each action as a target distribution. The training procedure minimizes a divergence of the actor's output distribution to this target. We derive the general optimization target for arbitrary f-divergences using a combination of kernel density estimates, resampling, and importance sampling. We further utilize an auxiliary critic to reduce the interactions with the environment. A preliminary comparison to related strategies shows that our approach learns to visit all the modes in the feasible action space, demonstrating the framework's potential for learning skills that can be used in various downstream tasks.

翻译：若干机器学习应用的特点是在复杂任务中寻找最优解。寻找最优解的搜索空间通常非常庞大，以至于最优解往往无法计算得出。问题部分在于，通过机器学习找到的许多候选解实际上是不可行的，必须被丢弃。将搜索空间限制在仅包含可行解候选范围内，可以简化任务最优解的寻找过程。此外，可行解集可以在以不同任务为特征的多个问题中重复使用。特别地，我们观察到复杂任务可以分解为子任务及相应的技能。我们提出通过训练一个执行器生成所有可行动作来学习可复用且可迁移的技能。训练后的执行器能够提出可行动作，并可根据具体任务在这些动作中选择最优方案。该执行器的训练过程将每个动作的可行性解释为目标分布，通过最小化执行器输出分布与目标分布之间的散度来实现。我们结合核密度估计、重采样和重要性采样方法，推导出任意f-散度下的通用优化目标，并进一步利用辅助评论家减少与环境交互。与相关策略的初步比较表明，我们的方法能够遍历可行动作空间中的所有模态，验证了该框架在学习可用于多种下游任务的技能方面的潜力。

0

相关内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

ARB抑制miR-193a表达促进早期糖尿病肾病壁层上皮细胞-足细胞转分化研究

国家自然科学基金

0+阅读 · 2015年12月31日

植物分子设计中高维数据的低维稀疏逼近方法

国家自然科学基金

0+阅读 · 2015年12月31日

TMOD1调节actin聚合影响胰岛素信号转导的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

Wnt/β-catenin和 Hedgehog信号通路互作在骨关节中的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于本体进化的自演化应用服务系统构造研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模机器学习问题的结构优化方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

抑癌基因PDCD4调控miR-184和miR-374a抑制鼻咽癌生长及促进凋亡

国家自然科学基金

0+阅读 · 2012年12月31日

细菌纤维素对低温条件下提高甲烷产量影响机制探索

国家自然科学基金

0+阅读 · 2012年12月31日

新型含阴离子受体基团的锂离子二次电池聚合物电解质隔膜研究

国家自然科学基金

0+阅读 · 2011年12月31日

Rho信号通路在张应变诱导人牙周膜细胞细胞骨架重建中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

A Survey of Demonstration Learning

Arxiv

0+阅读 · 2023年3月20日

Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models

Arxiv

0+阅读 · 2023年3月20日

Hierarchical-Hyperplane Kernels for Actively Learning Gaussian Process Models of Nonstationary Systems

Arxiv

0+阅读 · 2023年3月17日

Learning to Bound Counterfactual Inference from Observational, Biased and Randomised Data

Arxiv

0+阅读 · 2023年3月16日

Evaluation of distance-based approaches for forensic comparison: Application to hand odor evidence

Arxiv

0+阅读 · 2023年3月16日

Learning Minimally-Violating Continuous Control for Infeasible Linear Temporal Logic Specifications

Arxiv

0+阅读 · 2023年3月16日

A Survey of Learning on Small Data

Arxiv

19+阅读 · 2022年7月29日

Controllable Data Generation by Deep Learning: A Review

Arxiv

15+阅读 · 2022年7月19日

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Arxiv

36+阅读 · 2022年4月25日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

VIP会员

文章信息

相关主题

最新内容

“天降毒雾”：无人机如何使化学战重返乌克兰战场

“天降毒雾”：无人机如何使化学战重返乌克兰战场

专知会员服务

0+阅读 · 今天11:28

伊朗不对称防空战略的演进

伊朗不对称防空战略的演进

专知会员服务

1+阅读 · 今天11:10

对抗环境下超视距目标打击的情报支援

对抗环境下超视距目标打击的情报支援

专知会员服务

10+阅读 · 7月22日

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

专知会员服务

4+阅读 · 7月22日

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

专知会员服务

8+阅读 · 7月22日

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

专知会员服务

8+阅读 · 7月22日

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

15+阅读 · 7月21日

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

13+阅读 · 7月21日

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

4+阅读 · 7月21日

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

6+阅读 · 7月21日

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

9+阅读 · 7月21日

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

7+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

9+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

8+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

10+阅读 · 7月20日

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

伊朗不对称防空战略的演进

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

“天降毒雾”：无人机如何使化学战重返乌克兰战场

对抗环境下超视距目标打击的情报支援

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

A Survey of Demonstration Learning

Arxiv

0+阅读 · 2023年3月20日

Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models

Arxiv

0+阅读 · 2023年3月20日

Hierarchical-Hyperplane Kernels for Actively Learning Gaussian Process Models of Nonstationary Systems

Arxiv

0+阅读 · 2023年3月17日

Learning to Bound Counterfactual Inference from Observational, Biased and Randomised Data

Arxiv

0+阅读 · 2023年3月16日

Evaluation of distance-based approaches for forensic comparison: Application to hand odor evidence

Arxiv

0+阅读 · 2023年3月16日

Learning Minimally-Violating Continuous Control for Infeasible Linear Temporal Logic Specifications

Arxiv

0+阅读 · 2023年3月16日

A Survey of Learning on Small Data

Arxiv

19+阅读 · 2022年7月29日

Controllable Data Generation by Deep Learning: A Review

Arxiv

15+阅读 · 2022年7月19日

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Arxiv

36+阅读 · 2022年4月25日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

相关基金

ARB抑制miR-193a表达促进早期糖尿病肾病壁层上皮细胞-足细胞转分化研究

国家自然科学基金

0+阅读 · 2015年12月31日

植物分子设计中高维数据的低维稀疏逼近方法

国家自然科学基金

0+阅读 · 2015年12月31日

TMOD1调节actin聚合影响胰岛素信号转导的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

Wnt/β-catenin和 Hedgehog信号通路互作在骨关节中的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于本体进化的自演化应用服务系统构造研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模机器学习问题的结构优化方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

抑癌基因PDCD4调控miR-184和miR-374a抑制鼻咽癌生长及促进凋亡

国家自然科学基金

0+阅读 · 2012年12月31日

细菌纤维素对低温条件下提高甲烷产量影响机制探索

国家自然科学基金

0+阅读 · 2012年12月31日

新型含阴离子受体基团的锂离子二次电池聚合物电解质隔膜研究

国家自然科学基金

0+阅读 · 2011年12月31日

Rho信号通路在张应变诱导人牙周膜细胞细胞骨架重建中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员