The recent breakthrough successes in machine learning are mainly attributed to scale: namely large-scale attention-based architectures and datasets of unprecedented scale. This paper investigates the impact of training at scale for chess. Unlike traditional chess engines that rely on complex heuristics, explicit search, or a combination of both, we train a 270M parameter transformer model with supervised learning on a dataset of 10 million chess games. We annotate each board in the dataset with action-values provided by the powerful Stockfish 16 engine, leading to roughly 15 billion data points. Our largest model reaches a Lichess blitz Elo of 2895 against humans, and successfully solves a series of challenging chess puzzles, without any domain-specific tweaks or explicit search algorithms. We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct. A systematic investigation of model and dataset size shows that strong chess performance only arises at sufficient scale. To validate our results, we perform an extensive series of ablations of design choices and hyperparameters.
翻译:近期机器学习领域的突破性成功主要归因于规模化:即大规模基于注意力机制的架构与前所未有规模的数据集。本文研究了训练规模对国际象棋领域的影响。不同于依赖复杂启发式方法、显式搜索或两者结合的传统国际象棋引擎,我们采用监督学习方式,在包含1000万盘对局的数据集上训练了一个2.7亿参数的Transformer模型。我们利用强大的Stockfish 16引擎为数据集中每个棋盘的局势标注行动值,共计约150亿个数据点。我们的最大模型在与人类对弈中达到2895的Lichess快棋等级分,并成功解决了一系列具有挑战性的国际象棋谜题,且未使用任何领域特定技巧或显式搜索算法。我们同时证明,该模型在性能上超越了AlphaZero的策略网络与价值网络(不含蒙特卡洛树搜索)以及GPT-3.5-turbo-instruct。通过对模型与数据集规模的系统研究,我们发现强大的国际象棋表现仅在足够规模下才能实现。为验证结果,我们针对设计选择与超参数开展了广泛消融实验。