As artificial intelligence research advances, the platforms used to evaluate AI agents need to adapt and grow to continue to challenge them. We present the Polycraft World AI Lab (PAL), a task simulator with an API based on the Minecraft mod Polycraft World. Our platform is built to allow AI agents with different architectures to easily interact with the Minecraft world, train and be evaluated in multiple tasks. PAL enables the creation of tasks in a flexible manner as well as having the capability to manipulate any aspect of the task during an evaluation. All actions taken by AI agents and external actors (non-player-characters, NPCs) in the open-world environment are logged to streamline evaluation. Here we present two custom tasks on the PAL platform, one focused on multi-step planning and one focused on navigation, and evaluations of agents solving them. In summary, we report a versatile and extensible AI evaluation platform with a low barrier to entry for AI researchers to utilize.
翻译:随着人工智能研究的不断进步,用于评估AI智能体的平台需要适应并发展,以持续对其构成挑战。我们提出Polycraft World AI实验室(PAL)——一个基于Minecraft模组Polycraft World构建、提供应用程序接口的任务模拟器。该平台旨在让不同架构的AI智能体能够便捷地与Minecraft世界交互,在多任务场景中完成训练与评估。PAL支持以灵活方式创建任务,并能在评估过程中操控任务的任意参数。开放世界环境中AI智能体与外部角色(非玩家角色NPC)的所有操作行为均被记录,以便简化评估流程。本文在PAL平台上展示了两项定制化任务:一项聚焦多步骤规划,另一项聚焦导航能力,并呈现了智能体解决这些任务的评估结果。综上,我们报告了一个兼具通用性与可扩展性的AI评估平台,该平台为AI研究人员提供了极低的使用门槛。