This paper presents MiniZero, a zero-knowledge learning framework that supports four state-of-the-art algorithms, including AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel MuZero. While these algorithms have demonstrated super-human performance in many games, it remains unclear which among them is most suitable or efficient for specific tasks. Through MiniZero, we systematically evaluate the performance of each algorithm in two board games, 9x9 Go and 8x8 Othello, as well as 57 Atari games. For two board games, using more simulations generally results in higher performance. However, the choice of AlphaZero and MuZero may differ based on game properties. For Atari games, both MuZero and Gumbel MuZero are worth considering. Since each game has unique characteristics, different algorithms and simulations yield varying results. In addition, we introduce an approach, called progressive simulation, which progressively increases the simulation budget during training to allocate computation more efficiently. Our empirical results demonstrate that progressive simulation achieves significantly superior performance in two board games. By making our framework and trained models publicly available, this paper contributes a benchmark for future research on zero-knowledge learning algorithms, assisting researchers in algorithm selection and comparison against these zero-knowledge learning baselines. Our code and data are available at https://rlg.iis.sinica.edu.tw/papers/minizero.
翻译:本文提出MiniZero——一个支持四种最先进算法的零知识学习框架,涵盖AlphaZero、MuZero、Gumbel AlphaZero及Gumbel MuZero。尽管这些算法已在众多游戏中展现出超人类性能,但针对特定任务时,究竟哪种算法最适配或最高效,目前仍不明确。通过MiniZero,我们系统评估了各算法在9x9围棋、8x8黑白棋两种棋盘游戏以及57款Atari游戏上的表现。在两种棋盘游戏中,增加模拟次数通常能提升性能,但AlphaZero与MuZero的选用需根据游戏特性而定。对于Atari游戏,MuZero与Gumbel MuZero均值得纳入考量。由于每款游戏具有独特性质,不同算法与模拟次数会产生差异化结果。此外,我们提出了一种名为"渐进式模拟"的方法,在训练过程中逐步增加模拟预算以实现更高效的计算资源分配。实验结果表明,渐进式模拟在两种棋盘游戏上取得了显著更优的性能。通过公开框架与训练模型,本文为零知识学习算法的未来研究提供了基准,有助于研究者进行算法选择及与这些零知识学习基线方案的对比。我们的代码与数据可在https://rlg.iis.sinica.edu.tw/papers/minizero获取。