Reinforcement learning has been widely successful in producing agents capable of playing games at a human level. However, this requires complex reward engineering, and the agent's resulting policy is often unpredictable. Going beyond reinforcement learning is necessary to model a wide range of human playstyles, which can be difficult to represent with a reward function. This paper presents a novel imitation learning approach to generate multiple persona policies for playtesting. Multimodal Generative Adversarial Imitation Learning (MultiGAIL) uses an auxiliary input parameter to learn distinct personas using a single-agent model. MultiGAIL is based on generative adversarial imitation learning and uses multiple discriminators as reward models, inferring the environment reward by comparing the agent and distinct expert policies. The reward from each discriminator is weighted according to the auxiliary input. Our experimental analysis demonstrates the effectiveness of our technique in two environments with continuous and discrete action spaces.
翻译:强化学习在生成能够达到人类水平的游戏智能体方面已取得广泛成功。然而,这需要复杂的奖励工程设计,且智能体最终策略往往不可预测。为了对难以用奖励函数表征的多样化人类游戏风格进行建模,必须超越传统强化学习框架。本文提出一种新颖的模仿学习方法,用于生成用于游戏测试的多种角色策略。多模态生成对抗模仿学习通过辅助输入参数,使用单一智能体模型学习不同角色。MultiGAIL基于生成对抗模仿学习框架,采用多个判别器作为奖励模型,通过比较智能体策略与不同专家策略来推断环境奖励。每个判别器的奖励依据辅助输入进行加权。实验分析表明,该技术在连续动作空间和离散动作空间两种环境中均具有有效性。