Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment. To overcome these issues, we study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets. We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics, leveraging access to a generative model (i.e., simulator). We further demonstrate the statistical sample complexity of the proposed method for different uncertainty sets. These complexity bounds are independent of the number of states and extend beyond linear dynamics, ensuring the effectiveness of our approach in identifying near-optimal distributionally-robust policies. The proposed method can be further combined with other model-free distributionally robust reinforcement learning methods to obtain a near-optimal robust policy. Experimental results demonstrate the robustness of our algorithm to distributional shifts and its superior performance in terms of the number of samples needed.
翻译:强化学习面临的三大挑战是:具有大状态空间的复杂动态系统、昂贵的数据获取过程,以及真实环境动力学与训练环境部署之间的偏差。针对这些问题,我们研究了在连续状态空间下,基于广泛使用的Kullback-Leibler散度、卡方散度和全变差不确定集所构建的分布鲁棒马尔可夫决策过程。我们提出了一种基于模型的方法,该方法利用高斯过程和最大方差缩减算法,借助生成模型(即模拟器)高效学习多输出标称转移动力学。我们进一步证明了所提方法在不同不确定集下的统计样本复杂度。这些复杂度界与状态数量无关,且超越了线性动力学的局限,从而确保该方法在识别近优分布鲁棒策略方面的有效性。所提方法还可与其他无模型分布鲁棒强化学习方法结合,以获得近优鲁棒策略。实验结果表明,该算法对分布偏移具有鲁棒性,且在所需样本数量方面表现出优越性能。