One of the most well-studied and highly performing planning approaches used in Model-Based Reinforcement Learning (MBRL) is Monte-Carlo Tree Search (MCTS). Key challenges of MCTS-based MBRL methods remain dedicated deep exploration and reliability in the face of the unknown, and both challenges can be alleviated through principled epistemic uncertainty estimation in the predictions of MCTS. We present two main contributions: First, we develop methodology to propagate epistemic uncertainty in MCTS, enabling agents to estimate the epistemic uncertainty in their predictions. Second, we utilize the propagated uncertainty for a novel deep exploration algorithm by explicitly planning to explore. We incorporate our approach into variations of MCTS-based MBRL approaches with learned and provided dynamics models, and empirically show deep exploration through successful epistemic uncertainty estimation achieved by our approach. We compare to a non-planning-based deep-exploration baseline, and demonstrate that planning with epistemic MCTS significantly outperforms non-planning based exploration in the investigated deep exploration benchmark.
翻译:在基于模型的强化学习(MBRL)中,最受广泛研究且性能卓越的规划方法之一是蒙特卡洛树搜索(MCTS)。基于MCTS的MBRL方法面临的关键挑战包括针对未知情境的专用深度探索与可靠性,而这两个问题均可通过对MCTS预测进行严谨的认知不确定性估计来缓解。我们提出两项主要贡献:首先,我们开发了一种在MCTS中传播认知不确定性的方法论,使智能体能够估计其预测中的认知不确定性;其次,我们利用传播的不确定性设计了一种新颖的深度探索算法,通过显式规划实现探索。我们将该方法集成到基于MCTS的MBRL方法的变体中(包括学习动力学模型与已知动力学模型),并通过实验表明,我们的方法通过成功的认知不确定性估计实现了深度探索。与基于非规划的深度探索基线相比,我们证明在研究的深度探索基准测试中,基于认知MCTS的规划显著优于非规划探索方法。