The offline datasets for imitation learning (IL) in multi-agent games typically contain player trajectories exhibiting diverse strategies, which necessitate measures to prevent learning algorithms from acquiring undesirable behaviors. Learning representations for these trajectories is an effective approach to depicting the strategies employed by each demonstrator. However, existing learning strategies often require player identification or rely on strong assumptions, which are not appropriate for multi-agent games. Therefore, in this paper, we introduce the Strategy Representation for Imitation Learning (STRIL) framework, which (1) effectively learns strategy representations in multi-agent games, (2) estimates proposed indicators based on these representations, and (3) filters out sub-optimal data using the indicators. STRIL is a plug-in method that can be integrated into existing IL algorithms. We demonstrate the effectiveness of STRIL across competitive multi-agent scenarios, including Two-player Pong, Limit Texas Hold'em, and Connect Four. Our approach successfully acquires strategy representations and indicators, thereby identifying dominant trajectories and significantly enhancing existing IL performance across these environments.
翻译:多智能体游戏中模仿学习(IL)的离线数据集通常包含展现多样化策略的玩家轨迹,这需要采取措施防止学习算法习得不良行为。为这些轨迹学习表示是刻画每个演示者所采用策略的有效方法。然而,现有的策略学习方法通常需要玩家身份识别或依赖强假设,这不适用于多智能体游戏。因此,本文提出模仿学习的策略表示(STRIL)框架,该框架能够(1)在多智能体游戏中有效学习策略表示,(2)基于这些表示估计所提出的指标,以及(3)使用这些指标过滤次优数据。STRIL是一种可集成到现有IL算法中的即插即用方法。我们在竞争性多智能体场景(包括双人Pong、限注德州扑克和四子棋)中验证了STRIL的有效性。我们的方法成功获取了策略表示和指标,从而识别出优势轨迹,并显著提升了这些环境中现有IL的性能。