One weakness of Monte Carlo Tree Search (MCTS) is its sample efficiency which can be addressed by building and using state and/or action abstractions in parallel to the tree search such that information can be shared among nodes of the same layer. The primary usage of abstractions for MCTS is to enhance the Upper Confidence Bound (UCB) value during the tree policy by aggregating visits and returns of an abstract node. However, this direct usage of abstractions does not take the case into account where multiple actions with the same parent might be in the same abstract node, as these would then all have the same UCB value, thus requiring a tiebreak rule. In state-of-the-art abstraction algorithms such as pruned On the Go Abstractions (pruned OGA), this case has not been noticed, and a random tiebreak rule was implicitly chosen. In this paper, we propose and empirically evaluate several alternative intra-abstraction policies, several of which outperform the random policy across a majority of environments and parameter settings.
翻译:蒙特卡洛树搜索(MCTS)的一个弱点是其样本效率,这一问题可通过在树搜索过程中并行构建并使用状态和/或动作抽象来解决,使得同一层节点间能够共享信息。抽象在MCTS中的主要应用是通过聚合抽象节点的访问次数与回报值,在树策略阶段增强上置信界(UCB)值。然而,这种对抽象的直接使用未考虑以下情况:当同一父节点下的多个动作属于同一抽象节点时,这些动作将具有相同的UCB值,因而需要引入平局决胜规则。在剪枝式即时抽象(pruned OGA)等前沿抽象算法中,此问题尚未被关注,且默认采用了随机平局决胜规则。本文提出并实证评估了多种替代性内部抽象策略,其中多数策略在大多数环境与参数设置下表现优于随机策略。