Achieving human-AI alignment in complex multi-agent games is crucial for creating trustworthy AI agents that enhance gameplay. We propose a method to evaluate this alignment using an interpretable task-sets framework, focusing on high-level behavioral tasks instead of low-level policies. Our approach has three components. First, we analyze extensive human gameplay data from Xbox's Bleeding Edge (100K+ games), uncovering behavioral patterns in a complex task space. This task space serves as a basis set for a behavior manifold capturing interpretable axes: fight-flight, explore-exploit, and solo-multi-agent. Second, we train an AI agent to play Bleeding Edge using a Generative Pretrained Causal Transformer and measure its behavior. Third, we project human and AI gameplay to the proposed behavior manifold to compare and contrast. This allows us to interpret differences in policy as higher-level behavioral concepts, e.g., we find that while human players exhibit variability in fight-flight and explore-exploit behavior, AI players tend towards uniformity. Furthermore, AI agents predominantly engage in solo play, while humans often engage in cooperative and competitive multi-agent patterns. These stark differences underscore the need for interpretable evaluation, design, and integration of AI in human-aligned applications. Our study advances the alignment discussion in AI and especially generative AI research, offering a measurable framework for interpretable human-agent alignment in multiplayer gaming.
翻译:在多智能体复杂游戏中实现人机对齐对于构建能够提升游戏体验的可信AI智能体至关重要。我们提出了一种基于可解释任务集框架的评估方法,侧重于高层次行为任务而非低层次策略。该方法包含三个组成部分:首先,我们分析了来自Xbox《嗜血边缘》游戏中超过10万场海量人类游戏数据,揭示了复杂任务空间中的行为模式——该任务空间构成了行为流形的基集,捕捉了"战斗-逃离"、"探索-利用"及"独立-多智能体"三大可解释维度;其次,我们利用生成式预训练因果Transformer训练了《嗜血边缘》AI智能体并度量其行为;最后,我们将人类与AI的游戏过程映射至所构建的行为流形进行对比分析。这使得我们能够将策略差异解读为高层次行为概念:例如,我们发现人类玩家在"战斗-逃离"和"探索-利用"维度上呈现行为多样性,而AI玩家趋于同质化。此外,AI智能体主要表现为独立游戏模式,而人类玩家则频繁展现协作与竞争的多智能体行为模式。这些显著差异揭示了在符合人类对齐的应用中开展可解释评估、设计与集成AI的必要性。本研究推动了AI领域(尤其是生成式AI研究)中关于对齐问题的探讨,为多人游戏中可解释的人机对齐提供了可量化评估框架。