The generalization of decision-making agents encompasses two fundamental elements: learning from past experiences and reasoning in novel contexts. However, the predominant emphasis in most interactive environments is on learning, often at the expense of complexity in reasoning. In this paper, we introduce CivRealm, an environment inspired by the Civilization game. Civilization's profound alignment with human history and society necessitates sophisticated learning, while its ever-changing situations demand strong reasoning to generalize. Particularly, CivRealm sets up an imperfect-information general-sum game with a changing number of players; it presents a plethora of complex features, challenging the agent to deal with open-ended stochastic environments that require diplomacy and negotiation skills. Within CivRealm, we provide interfaces for two typical agent types: tensor-based agents that focus on learning, and language-based agents that emphasize reasoning. To catalyze further research, we present initial results for both paradigms. The canonical RL-based agents exhibit reasonable performance in mini-games, whereas both RL- and LLM-based agents struggle to make substantial progress in the full game. Overall, CivRealm stands as a unique learning and reasoning challenge for decision-making agents. The code is available at https://github.com/bigai-ai/civrealm.
翻译:决策智能体的泛化能力包含两个基本要素:从过往经验中学习以及在全新情境中进行推理。然而,多数交互式环境过度强调学习维度,往往以牺牲推理的复杂性为代价。本文介绍受《文明》游戏启发的CivRealm环境。《文明》与人类历史社会的深刻契合要求智能体具备复杂的学习能力,而其持续变化的局势又需要强大的推理能力来达成泛化。具体而言,CivRealm构建了一个玩家数量动态变化的不完全信息一般和博弈环境;其呈现的众多复杂特征,考验智能体在需要外交与谈判技能的开方式随机环境中的应对能力。在该环境中,我们为两种典型智能体类型提供了接口:侧重学习的张量基智能体,以及强调推理的语言基智能体。为促进后续研究,我们展示了两种范式的初步结果。基于经典强化学习的智能体在迷你游戏中表现合理,而基于强化学习和大语言模型的智能体均难以在完整游戏中取得实质性进展。总体而言,CivRealm为决策智能体提供了独特的学习与推理挑战。代码已开源:https://github.com/bigai-ai/civrealm。