AlphaViT: A Flexible Game-Playing AI for Multiple Games and Variable Board Sizes

This paper presents novel game-playing AI agents based on the AlphaZero framework, enhanced with Vision Transformer (ViT): AlphaViT, AlphaViD, and AlphaVDA. These agents are designed to play multiple board games of various sizes using a single network with shared weights, thereby overcoming AlphaZero's limitation of fixed-board-size constraints. AlphaViT employs only a transformer encoder, whereas AlphaViD and AlphaVDA incorporate both transformer encoders and decoders. In AlphaViD, the decoder processes outputs from the encoder, whereas AlphaVDA uses a learnable embeddings as the decoder input. The additional decoder layers in AlphaViD and AlphaVDA provide flexibility to adapt to various action spaces and board sizes. Experimental results show that the proposed agents, trained on either individual games or multiple games simultaneously, consistently outperform traditional algorithms such as Minimax and Monte Carlo Tree Search and approach the performance of AlphaZero, despite using a single deep neural network (DNN) with shared weights. In particular, AlphaViT shows strong performance across all tested games. Furthermore, fine-tuning the DNN using pre-trained weights from small-board games accelerates convergence and improves performance, particularly in Gomoku. Interestingly, simultaneous training on multiple games yields performance comparable to, or even surpassing, single-game training. These results indicate the potential of transformer-based architectures to develop more flexible and robust game-playing AI agents that excel in multiple games and dynamic environments.

翻译：本文提出了基于AlphaZero框架并增强以Vision Transformer (ViT)的新型游戏AI智能体：AlphaViT、AlphaViD与AlphaVDA。这些智能体设计用于通过单一共享权重的网络玩多种不同尺寸的棋盘游戏，从而克服了AlphaZero固定棋盘尺寸的限制。AlphaViT仅采用Transformer编码器，而AlphaViD与AlphaVDA则同时包含Transformer编码器与解码器。在AlphaViD中，解码器处理编码器的输出，而AlphaVDA使用可学习的嵌入作为解码器输入。AlphaViD与AlphaVDA中额外的解码层提供了适应不同动作空间与棋盘尺寸的灵活性。实验结果表明，所提出的智能体无论是在单个游戏还是多个游戏上同时训练，均持续超越传统算法如Minimax与蒙特卡洛树搜索，并接近AlphaZero的性能，尽管其仅使用单一共享权重的深度神经网络（DNN）。特别地，AlphaViT在所有测试游戏中均表现出强劲性能。此外，利用从小棋盘游戏预训练的权重对DNN进行微调，可加速收敛并提升性能，尤其在五子棋中表现显著。有趣的是，在多个游戏上同时训练所获得的性能可与单游戏训练相媲美，甚至更优。这些结果表明，基于Transformer的架构具有开发更灵活、更鲁棒的游戏AI智能体的潜力，使其能够在多种游戏与动态环境中表现出色。