While Large Language Models (LLMs) have achieved remarkable success in formal learning tasks such as mathematics and code generation, they still struggle with the "practical wisdom" and generalizable intelligence, such as strategic creativity and social reasoning, that characterize human cognition. This gap arises from a lack of informal learning, which thrives on interactive feedback rather than goal-oriented instruction. In this paper, we propose treating Games as a primary environment for LLM informal learning, leveraging their intrinsic reward signals and abstracted complexity to cultivate diverse competencies. To address the performance degradation observed in multi-task learning, we introduce a Nested Training Framework. Unlike naive task mixing optimizing an implicit "OR" objective, our framework employs sequential task composition to enforce an explicit "AND" objective, compelling the model to master multiple abilities simultaneously to achieve maximal rewards. Using GRPO-based reinforcement learning across Matrix Games, TicTacToe, and Who's the Spy games, we demonstrate that integrating game-based informal learning not only prevents task interference but also significantly bolsters the model's generalization across broad ability-oriented benchmarks. The framework and implementation are publicly available.
翻译:尽管大语言模型在数学与代码生成等正式学习任务中取得了显著成功,但其在体现人类认知特征的"实践智慧"与泛化智能方面——例如策略创造力与社会推理——仍存在不足。这一差距源于非正式学习的缺失,该学习模式依赖交互式反馈而非目标导向的指令。本文提出将游戏作为大语言模型非正式学习的主要环境,利用其内在奖励信号与抽象复杂性来培养多样化能力。针对多任务学习中观察到的性能退化问题,我们提出一种嵌套训练框架。与优化隐式"或"目标的简单任务混合不同,该框架采用顺序任务组合来强制执行显式"与"目标,迫使模型同时掌握多种能力以获得最大奖励。通过在矩阵博弈、井字棋和"谁是卧底"游戏中基于GRPO的强化学习实验,我们证明融合游戏化非正式学习不仅能防止任务干扰,还能显著增强模型在广泛能力导向基准测试中的泛化性能。本框架与实现代码均已公开。