Artificial intelligence has advanced significantly through the development of intelligent game-playing systems, providing rigorous testbeds for decision-making, strategic planning, and adaptive learning. However, resource-constrained environments pose critical challenges, as conventional deep learning methods heavily rely on extensive datasets and computational resources. In this paper, we propose a lightweight hybrid framework for the Game of the Amazons, which explores the paradigm of weak-to-strong generalization by integrating the structural reasoning of graph-based learning with the generative capabilities of large language models. Specifically, we leverage a Graph Attention Autoencoder to inform a multi-step Monte Carlo Tree Search, utilize a Stochastic Graph Genetic Algorithm to optimize evaluation signals, and harness GPT-4o-mini to generate synthetic training data. Unlike traditional approaches that rely on expert demonstrations, our framework learns from noisy and imperfect supervision. We demonstrate that the Graph Attention mechanism effectively functions as a structural filter, denoising the LLM's outputs. Experiments on a 10$\times$10 Amazons board show that our hybrid approach not only achieves a 15\%--56\% improvement in decision accuracy over baselines but also significantly outperforms its teacher model (GPT-4o-mini), achieving a competitive win rate of 45.0\% at N=30 nodes and a decisive 66.5\% at only N=50 nodes. These results verify the feasibility of evolving specialized, high-performance game AI from general-purpose foundation models under stringent computational constraints.
翻译:人工智能通过在智能博弈系统上的发展取得了显著进展,这些系统为决策制定、战略规划和自适应学习提供了严格的测试平台。然而,资源受限的环境带来了关键挑战,因为传统深度学习方法严重依赖大规模数据集和计算资源。本文提出了一种轻量级混合框架,用于亚马逊棋游戏,该框架通过整合基于图学习的结构推理与大语言模型的生成能力,探索了从弱到强泛化的范式。具体而言,我们利用图注意力自编码器为多步蒙特卡洛树搜索提供信息,使用随机图遗传算法优化评估信号,并借助GPT-4o-mini生成合成训练数据。与传统依赖专家演示的方法不同,我们的框架从带有噪声和不完美的监督信号中学习。我们证明,图注意力机制能够有效充当结构化滤波器,对大语言模型的输出进行去噪。在10×10亚马逊棋盘上的实验表明,我们的混合方法不仅在决策准确性上比基线方法提升了15%至56%,而且显著超越了其教师模型(GPT-4o-mini),在N=30节点时实现了45.0%的有竞争力胜率,在N=50节点时更是达到了决定性的66.5%。这些结果验证了在严格计算约束下,从通用基础模型演化出专用高性能博弈AI的可行性。