Achieving human-AI alignment in complex multi-agent games is crucial for creating trustworthy AI agents that enhance gameplay. We propose a method to evaluate this alignment using an interpretable task-sets framework, focusing on high-level behavioral tasks instead of low-level policies. Our approach has three components. First, we analyze extensive human gameplay data from Xbox's Bleeding Edge (100K+ games), uncovering behavioral patterns in a complex task space. This task space serves as a basis set for a behavior manifold capturing interpretable axes: fight-flight, explore-exploit, and solo-multi-agent. Second, we train an AI agent to play Bleeding Edge using a Generative Pretrained Causal Transformer and measure its behavior. Third, we project human and AI gameplay to the proposed behavior manifold to compare and contrast. This allows us to interpret differences in policy as higher-level behavioral concepts, e.g., we find that while human players exhibit variability in fight-flight and explore-exploit behavior, AI players tend towards uniformity. Furthermore, AI agents predominantly engage in solo play, while humans often engage in cooperative and competitive multi-agent patterns. These stark differences underscore the need for interpretable evaluation, design, and integration of AI in human-aligned applications. Our study advances the alignment discussion in AI and especially generative AI research, offering a measurable framework for interpretable human-agent alignment in multiplayer gaming.
翻译:在复杂多智能体游戏中实现人机对齐对于构建增强游戏体验的可信AI智能体至关重要。我们提出一种基于可解释任务集框架的评估方法,该方法聚焦于高层行为任务而非底层策略。我们的方法包含三个组成部分。首先,我们分析了来自Xbox游戏《Bleeding Edge》的大量人类游戏数据(超过10万场对局),在复杂任务空间中揭示了行为模式。该任务空间构成了行为流形的基础集,该流形捕获了三个可解释维度:战斗-回避、探索-利用以及单人-多智能体。其次,我们使用生成式预训练因果Transformer训练了一个《Bleeding Edge》AI智能体并测量其行为。第三,我们将人类与AI的游戏过程投影到所提出的行为流形上进行对比分析。这使得我们能够将策略差异解释为更高层次的行为概念,例如,我们发现人类玩家在战斗-回避和探索-利用行为上表现出多样性,而AI玩家则趋于一致。此外,AI智能体主要进行单人游戏,而人类则经常参与合作性与竞争性的多智能体互动模式。这些显著差异凸显了在人机对齐应用中,对AI进行可解释评估、设计与整合的必要性。我们的研究推进了AI特别是生成式AI研究中的对齐讨论,为多人在线游戏中的可解释人机对齐提供了可量化的框架。