Probabilistic world models increase data efficiency of model-based reinforcement learning (MBRL) by guiding the policy with their epistemic uncertainty to improve exploration and acquire new samples. Moreover, the uncertainty-aware learning procedures in probabilistic approaches lead to robust policies that are less sensitive to noisy observations compared to uncertainty unaware solutions. We propose to combine trajectory sampling and deep Gaussian covariance network (DGCN) for a data-efficient solution to MBRL problems in an optimal control setting. We compare trajectory sampling with density-based approximation for uncertainty propagation using three different probabilistic world models; Gaussian processes, Bayesian neural networks, and DGCNs. We provide empirical evidence using four different well-known test environments, that our method improves the sample-efficiency over other combinations of uncertainty propagation methods and probabilistic models. During our tests, we place particular emphasis on the robustness of the learned policies with respect to noisy initial states.
翻译:概率世界模型通过其认知不确定性指导策略,提升探索效率并获取新样本,从而增加基于模型的强化学习(MBRL)的数据效率。此外,与不考虑不确定性的方法相比,概率方法中的不确定性感知学习过程能产生对噪声观测更不敏感的鲁棒策略。我们提出将轨迹采样与深度高斯协方差网络(DGCN)相结合,为最优控制场景下的MBRL问题提供数据高效的解决方案。我们采用三种不同概率世界模型(高斯过程、贝叶斯神经网络和深度高斯协方差网络)进行不确定性传播,比较了轨迹采样与基于密度的近似方法。通过四个公认测试环境的实证验证,表明我们的方法相比其他不确定性传播方法与概率模型的组合,显著提升了样本效率。在测试中,我们特别强调所学策略对噪声初始状态的鲁棒性。