Artificial intelligence has advanced significantly through the development of intelligent game-playing systems, providing rigorous testbeds for decision-making, strategic planning, and adaptive learning. However, resource-constrained environments pose critical challenges, as conventional deep learning methods heavily rely on extensive datasets and computational resources. In this paper, we propose a lightweight hybrid framework for the Game of the Amazons, which explores the paradigm of weak-to-strong generalization by integrating the structural reasoning of graph-based learning with the generative capabilities of large language models. Specifically, we leverage a Graph Attention Autoencoder to inform a multi-step Monte Carlo Tree Search, utilize a Stochastic Graph Genetic Algorithm to optimize evaluation signals, and harness GPT-4o-mini to generate synthetic training data. Unlike traditional approaches that rely on expert demonstrations, our framework learns from noisy and imperfect supervision. We demonstrate that the Graph Attention mechanism effectively functions as a structural filter, denoising the LLM's outputs. Experiments on a 10$\times$10 Amazons board show that our hybrid approach not only achieves a 15\%--56\% improvement in decision accuracy over baselines but also significantly outperforms its teacher model (GPT-4o-mini), achieving a competitive win rate of 45.0\% at N=30 nodes and a decisive 66.5\% at only N=50 nodes. These results verify the feasibility of evolving specialized, high-performance game AI from general-purpose foundation models under stringent computational constraints.
翻译:人工智能通过智能博弈系统的开发取得了显著进展,为决策制定、战略规划和自适应学习提供了严格的测试平台。然而,资源受限环境带来了严峻挑战,因为传统的深度学习方法严重依赖大量数据集和计算资源。本文针对亚马逊棋提出一种轻量级混合框架,通过将基于图学习的结构推理能力与大语言模型的生成能力相结合,探索弱到强泛化的范式。具体而言,我们利用图注意力自编码器为多步蒙特卡洛树搜索提供信息,采用随机图遗传算法优化评估信号,并借助GPT-4o-mini生成合成训练数据。与依赖专家演示的传统方法不同,我们的框架能够从含噪声的非完美监督中学习。实验证明,图注意力机制能有效充当结构滤波器,对大语言模型的输出进行去噪处理。在10×10亚马逊棋盘上的实验表明,我们的混合方法不仅比基线模型在决策准确率上提升了15%–56%,还显著超越了其教师模型(GPT-4o-mini),在N=30节点时达到45.0%的竞争性胜率,在仅N=50节点时更获得66.5%的决定性胜率。这些结果验证了在严格计算约束下,从通用基础模型演化出专业化高性能博弈人工智能的可行性。