Given a traversal algorithm, cover time is the expected number of steps needed to visit all nodes in a given graph. A smaller cover time means a higher exploration efficiency of traversal algorithm. Although random walk algorithms have been studied extensively in the existing literature, there has been no cover time result for any non-Markovian method. In this work, we stand on a theoretical perspective and show that the negative feedback strategy (a count-based exploration method) is better than the naive random walk search. In particular, the former strategy can locally improve the search efficiency for an arbitrary graph. It also achieves smaller cover times for special but important graphs, including clique graphs, tree graphs, etc. Moreover, we make connections between our results and reinforcement learning literature to give new insights on why classical UCB and MCTS algorithms are so useful. Various numerical results corroborate our theoretical findings.
翻译:给定一个遍历算法,覆盖时间是指访问给定图中所有节点所需的期望步数。覆盖时间越小,意味着遍历算法的探索效率越高。尽管现有文献已对随机游走算法进行了广泛研究,但尚未有任何非马尔可夫方法的覆盖时间结果。在本工作中,我们从理论视角出发,证明了负反馈策略(一种基于计数的探索方法)优于朴素的随机游走搜索。特别地,前一种策略能够在任意图上局部提升搜索效率。对于特殊但重要的图(包括团图、树图等),它也能实现更小的覆盖时间。此外,我们将我们的结果与强化学习文献建立联系,从而为经典UCB和MCTS算法为何如此有效提供新的见解。各种数值结果也印证了我们的理论发现。