The generalization of decision-making agents encompasses two fundamental elements: learning from past experiences and reasoning in novel contexts. However, the predominant emphasis in most interactive environments is on learning, often at the expense of complexity in reasoning. In this paper, we introduce CivRealm, an environment inspired by the Civilization game. Civilization's profound alignment with human history and society necessitates sophisticated learning, while its ever-changing situations demand strong reasoning to generalize. Particularly, CivRealm sets up an imperfect-information general-sum game with a changing number of players; it presents a plethora of complex features, challenging the agent to deal with open-ended stochastic environments that require diplomacy and negotiation skills. Within CivRealm, we provide interfaces for two typical agent types: tensor-based agents that focus on learning, and language-based agents that emphasize reasoning. To catalyze further research, we present initial results for both paradigms. The canonical RL-based agents exhibit reasonable performance in mini-games, whereas both RL- and LLM-based agents struggle to make substantial progress in the full game. Overall, CivRealm stands as a unique learning and reasoning challenge for decision-making agents. The code is available at https://github.com/bigai-ai/civrealm.
翻译:决策智能体的泛化能力涉及两个基本要素:从过往经验中学习以及在陌生情境中进行推理。然而,大多数交互式环境主要侧重于学习,往往牺牲了推理的复杂性。本文介绍了受文明游戏启发的CivRealm环境。文明游戏与人类历史和社会深度契合的特性要求复杂的学能力,而其中不断变化的情境则要求强大的推理能力以实现泛化。具体而言,CivRealm构建了一个玩家数量动态变化的非完美信息一般和博弈环境;该环境呈现大量复杂特性挑战智能体在需要外交与谈判技巧的开放式随机环境中做出决策。在CivRealm中,我们为两类典型智能体提供接口:基于张量(侧重于学习)和基于语言(强调推理)的智能体。为促进后续研究,我们展示了两种范式下的初步实验结果。基于标准强化学习的智能体在迷你游戏中表现尚可,而基于强化学习和大型语言模型的智能体在完整游戏中均难以取得实质性进展。总体而言,CivRealm为决策智能体提供了独特的学习与推理挑战。相关代码见https://github.com/bigai-ai/civrealm。