Efficient Exploration in Continuous-time Model-based Reinforcement Learning

Reinforcement learning algorithms typically consider discrete-time dynamics, even though the underlying systems are often continuous in time. In this paper, we introduce a model-based reinforcement learning algorithm that represents continuous-time dynamics using nonlinear ordinary differential equations (ODEs). We capture epistemic uncertainty using well-calibrated probabilistic models, and use the optimistic principle for exploration. Our regret bounds surface the importance of the measurement selection strategy(MSS), since in continuous time we not only must decide how to explore, but also when to observe the underlying system. Our analysis demonstrates that the regret is sublinear when modeling ODEs with Gaussian Processes (GP) for common choices of MSS, such as equidistant sampling. Additionally, we propose an adaptive, data-dependent, practical MSS that, when combined with GP dynamics, also achieves sublinear regret with significantly fewer samples. We showcase the benefits of continuous-time modeling over its discrete-time counterpart, as well as our proposed adaptive MSS over standard baselines, on several applications.

翻译：强化学习算法通常考虑离散时间动力学，尽管其底层系统往往是时间连续的。本文提出一种基于模型的强化学习算法，通过非线性常微分方程（ODE）表示连续时间动力学。我们利用校准良好的概率模型捕获认知不确定性，并采用乐观原则进行探索。我们的遗憾界揭示了测量选择策略（MSS）的重要性，因为在连续时间场景中，我们不仅需要决定如何探索，还需确定何时观测底层系统。分析表明：当使用高斯过程（GP）对ODE建模时，对于等距采样等常见MSS选择，遗憾呈次线性增长。此外，我们提出一种自适应的、数据驱动的实用MSS策略，该策略结合GP动力学后仅需显著更少的样本即可实现次线性遗憾。通过在多个应用中的实验，我们验证了连续时间建模相较于离散时间建模的优势，以及所提出的自适应MSS相较于标准基准方法的优越性。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日