基于有界不确定性模型的强化学习智能探索 (Smart Exploration in Reinforcement Learning using Bounded Uncertainty Models)

Reinforcement learning (RL) is a powerful framework for decision-making in uncertain environments, but it often requires large amounts of data to learn an optimal policy. We address this challenge by incorporating prior model knowledge to guide exploration and accelerate the learning process. Specifically, we assume access to a model set that contains the true transition kernel and reward function. We optimize over this model set to obtain upper and lower bounds on the Q-function, which are then used to guide the exploration of the agent. We provide theoretical guarantees on the convergence of the Q-function to the optimal Q-function under the proposed class of exploring policies. Furthermore, we also introduce a data-driven regularized version of the model set optimization problem that ensures the convergence of the class of exploring policies to the optimal policy. Lastly, we show that when the model set has a specific structure, namely the bounded-parameter MDP (BMDP) framework, the regularized model set optimization problem becomes convex and simple to implement. In this setting, we also prove finite-time convergence to the optimal policy under mild assumptions. We demonstrate the effectiveness of the proposed exploration strategy, which we call BUMEX (Bounded Uncertainty Model-based Exploration), in a simulation study. The results indicate that the proposed method can significantly accelerate learning in benchmark examples. A toolbox is available at https://github.com/JvHulst/BUMEX.

翻译：强化学习（RL）是在不确定环境中进行决策的强大框架，但通常需要大量数据才能学习到最优策略。我们通过引入先验模型知识来指导探索并加速学习过程，以应对这一挑战。具体而言，我们假设存在一个包含真实转移核与奖励函数的模型集合。我们对该模型集合进行优化，以获取Q函数的上界与下界，进而用于指导智能体的探索。我们为所提出的探索策略类别提供了Q函数收敛至最优Q函数的理论保证。此外，我们还引入了一种数据驱动的正则化模型集合优化问题版本，确保探索策略类别能够收敛至最优策略。最后，我们证明当模型集合具有特定结构（即有界参数马尔可夫决策过程（BMDP）框架）时，正则化模型集合优化问题将转化为凸优化问题且易于实现。在此设定下，我们还在温和假设条件下证明了策略在有限时间内收敛至最优性。我们通过仿真研究验证了所提出的探索策略（称为BUMEX，即基于有界不确定性模型的探索）的有效性。结果表明，该方法在基准测试案例中能显著加速学习进程。相关工具箱已发布于 https://github.com/JvHulst/BUMEX。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日