In this paper, we learn dynamics models for parametrized families of dynamical systems with varying properties. The dynamics models are formulated as stochastic processes conditioned on a latent context variable which is inferred from observed transitions of the respective system. The probabilistic formulation allows us to compute an action sequence which, for a limited number of environment interactions, optimally explores the given system within the parametrized family. This is achieved by steering the system through transitions being most informative for the context variable. We demonstrate the effectiveness of our method for exploration on a non-linear toy-problem and two well-known reinforcement learning environments.
翻译:本文针对具有变化特性的参数化动力学系统族,学习其动力学模型。动力学模型被构建为以潜在上下文变量为条件的随机过程,该变量通过观测相应系统的状态转移来推断。概率化表述使我们能够计算一个动作序列,该序列在有限的环境交互次数内,对参数化族中的给定系统实现最优探索。这是通过引导系统经历对上下文变量最具信息量的状态转移来实现的。我们通过一个非线性玩具问题及两个经典的强化学习环境,验证了所提探索方法的有效性。