This paper presents MiniZero, a zero-knowledge learning framework that supports four state-of-the-art algorithms, including AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel MuZero. While these algorithms have demonstrated super-human performance in many games, it remains unclear which among them is most suitable or efficient for specific tasks. Through MiniZero, we systematically evaluate the performance of each algorithm in two board games, 9x9 Go and 8x8 Othello, as well as 57 Atari games. For two board games, using more simulations generally results in higher performance. However, the choice of AlphaZero and MuZero may differ based on game properties. For Atari games, both MuZero and Gumbel MuZero are worth considering. Since each game has unique characteristics, different algorithms and simulations yield varying results. In addition, we introduce an approach, called progressive simulation, which progressively increases the simulation budget during training to allocate computation more efficiently. Our empirical results demonstrate that progressive simulation achieves significantly superior performance in two board games. By making our framework and trained models publicly available, this paper contributes a benchmark for future research on zero-knowledge learning algorithms, assisting researchers in algorithm selection and comparison against these zero-knowledge learning baselines. Our code and data are available at https://rlg.iis.sinica.edu.tw/papers/minizero.
翻译:本文提出MiniZero,一个支持AlphaZero、MuZero、Gumbel AlphaZero和Gumbel MuZero四种先进算法的零知识学习框架。尽管这些算法已在众多游戏中展现出超越人类的表现,但针对特定任务而言,究竟哪一种算法最为适用或高效尚不明确。通过MiniZero,我们系统评估了每种算法在两种棋盘游戏(9x9围棋和8x8黑白棋)以及57款Atari游戏中的性能。在两种棋盘游戏中,使用更多模拟次数通常能带来更高性能,但根据游戏特性,AlphaZero与MuZero的选择可能有所不同。在Atari游戏中,MuZero和Gumbel MuZero均值得考虑。由于每款游戏具有独特特性,不同算法与模拟次数会产生不同结果。此外,我们提出一种称为渐进式模拟的方法,在训练过程中逐步增加模拟预算以更高效地分配计算资源。实验结果表明,渐进式模拟在两种棋盘游戏中显著提升了性能。通过公开框架和预训练模型,本文为零知识学习算法的未来研究建立了基准,有助于研究人员在这些零知识学习基线基础上进行算法选择与对比。我们的代码与数据可在https://rlg.iis.sinica.edu.tw/papers/minizero获取。