Modern chess language models are dense transformers trained on millions of games played by thousands of high-rated individuals. However, these monolithic networks tend to collapse into mode-averaged behavior, where stylistic boundaries are blurred, and rare but effective strategies are suppressed. To counteract homogenization, we introduce Mixture-of-Masters (MoM), the first chess mixture-of-experts model with small-sized GPT experts emulating world-class grandmasters. Each expert is trained with a combination of self-supervised learning and reinforcement learning guided by chess-specific rewards. For each move, a post-hoc learnable gating network selects the most appropriate persona to channel depending on the game state, allowing MoM to switch its style dynamically$--$e.g., Tal's offensive vocation or Petrosian's defensive solidity. When evaluated against Stockfish on unseen standard games, MoM outperforms both dense individual expert networks and popular GPT baselines trained on aggregated data, while ensuring generation variety, control, and interpretability.
翻译:现代国际象棋语言模型是基于数千名高评级棋手对弈的数百万局棋谱训练的密集Transformer模型。然而,这些单一网络往往坍缩为模式平均行为,导致风格边界模糊,且罕见但有效的策略受到抑制。为对抗同质化,我们提出了大师混合模型,这是首个采用模拟世界级特级大师的小型GPT专家模块的国际象棋专家混合模型。每个专家模块通过结合自监督学习和由国际象棋特定奖励引导的强化学习进行训练。对于每一步棋,一个事后可学习的门控网络会根据棋局状态选择最合适的人格进行引导,使得大师混合模型能够动态切换其风格——例如塔尔式的进攻倾向或彼得罗相式的防守稳固性。在与Stockfish在未见过的标准棋局上进行评估时,大师混合模型的表现优于基于聚合数据训练的密集独立专家网络和流行的GPT基线模型,同时确保了生成多样性、可控性和可解释性。