One of the most well-studied and highly performing planning approaches used in Model-Based Reinforcement Learning (MBRL) is Monte-Carlo Tree Search (MCTS). Key challenges of MCTS-based MBRL methods remain dedicated deep exploration and reliability in the face of the unknown, and both challenges can be alleviated through principled epistemic uncertainty estimation in the predictions of MCTS. We present two main contributions: First, we develop methodology to propagate epistemic uncertainty in MCTS, enabling agents to estimate the epistemic uncertainty in their predictions. Second, we utilize the propagated uncertainty for a novel deep exploration algorithm by explicitly planning to explore. We incorporate our approach into variations of MCTS-based MBRL approaches with learned and provided models, and empirically show deep exploration through successful epistemic uncertainty estimation achieved by our approach. We compare to a non-planning-based deep-exploration baseline, and demonstrate that planning with epistemic MCTS significantly outperforms non-planning based exploration in the investigated setting.
翻译:在基于模型的强化学习(MBRL)中,最受研究且性能最高的规划方法之一是蒙特卡洛树搜索(MCTS)。基于MCTS的MBRL方法面临的关键挑战包括面对未知时的专用深度探索和可靠性,而这两个挑战可以通过对MCTS预测中认知不确定性进行原则性估计来缓解。我们提出两个主要贡献:首先,我们开发了在MCTS中传播认知不确定性的方法,使智能体能够估计其预测中的认知不确定性。其次,我们利用传播的不确定性,通过明确规划探索,提出了一种新颖的深度探索算法。我们将该方法融入到基于学习和提供模型的MCTS-MBRL方法的变体中,并通过实验证明,我们的方法通过成功的认知不确定性估计实现了深度探索。我们将该方法与非基于规划的深度探索基线进行比较,并证明在研究的设定中,基于认知MCTS的规划显著优于非基于规划的探索。