Chess is a canonical example of a task that requires rigorous reasoning and long-term planning. Modern decision Transformers - trained similarly to LLMs - are able to learn competent gameplay, but it is unclear to what extent they truly capture the rules of chess. To investigate this, we train a 270M parameter chess Transformer and test it on out-of-distribution scenarios, designed to reveal failures of systematic generalization. Our analysis shows that Transformers exhibit compositional generalization, as evidenced by strong rule extrapolation: they adhere to fundamental syntactic rules of the game by consistently choosing valid moves even in situations very different from the training data. Moreover, they also generate high-quality moves for OOD puzzles. In a more challenging test, we evaluate the models on variants including Chess960 (Fischer Random Chess) - a variant of chess where starting positions of pieces are randomized. We found that while the model exhibits basic strategy adaptation, they are inferior to symbolic AI algorithms that perform explicit search, but gap is smaller when playing against users on Lichess. Moreover, the training dynamics revealed that the model initially learns to move only its own pieces, suggesting an emergent compositional understanding of the game.
翻译:国际象棋是需要严谨推理与长期规划的典型任务。现代决策Transformer(训练方式类似于大语言模型)能够习得合格的棋艺,但其对国际象棋规则的实质掌握程度尚不明确。为探究此问题,我们训练了一个2.7亿参数的象棋Transformer,并在专门设计的分布外场景中测试其系统性泛化能力。分析表明,Transformer展现出组合泛化特性:即使在训练数据差异极大的情境下,模型仍能持续选择合规走法,严格遵循游戏的基本语法规则,这体现了强大的规则外推能力。此外,模型在分布外棋局谜题中也能生成高质量走法。在更具挑战性的测试中,我们评估了模型在包括Chess960(菲舍尔任意制象棋)等变体上的表现——该变体将棋子初始位置随机化。研究发现,虽然模型展现出基础策略适应能力,但其表现仍逊于执行显式搜索的符号人工智能算法;然而在Lichess平台与人类对弈时,这种差距会缩小。训练动态过程进一步揭示:模型最初仅学习移动己方棋子,这暗示着对游戏规则的理解具有渐进组合性特征。