The AlphaZero/MuZero (A/MZ) family of algorithms has achieved remarkable success across various challenging domains by integrating Monte Carlo Tree Search (MCTS) with learned models. Learned models introduce epistemic uncertainty, which is caused by learning from limited data and is useful for exploration in sparse reward environments. MCTS does not account for the propagation of this uncertainty however. To address this, we introduce Epistemic MCTS (EMCTS): a theoretically motivated approach to account for the epistemic uncertainty in search and harness the search for deep exploration. In the challenging sparse-reward task of writing code in the Assembly language {\sc subleq}, AZ paired with our method achieves significantly higher sample efficiency over baseline AZ. Search with EMCTS solves variations of the commonly used hard-exploration benchmark Deep Sea - which baseline A/MZ are practically unable to solve - much faster than an otherwise equivalent method that does not use search for uncertainty estimation, demonstrating significant benefits from search for epistemic uncertainty estimation.
翻译:AlphaZero/MuZero(A/MZ)系列算法通过将蒙特卡洛树搜索(MCTS)与学习模型相结合,在各种挑战性领域取得了显著成功。学习模型引入了认知不确定性,这种不确定性源于从有限数据中学习,在稀疏奖励环境中对探索有益。然而,MCTS并未考虑这种不确定性的传播。为解决此问题,我们提出了认知蒙特卡洛树搜索(EMCTS):一种理论驱动的方法,用于在搜索中处理认知不确定性,并利用搜索进行深度探索。在Assembly语言{\sc subleq}中编写代码这一具有挑战性的稀疏奖励任务中,结合我们方法的AZ相比基线AZ实现了显著更高的样本效率。使用EMCTS的搜索解决了常用硬探索基准Deep Sea的变体——基线A/MZ实际上无法解决这些变体——其速度远快于不使用搜索进行不确定性估计的等效方法,这证明了利用搜索进行认知不确定性估计的显著优势。