Closed drafting or "pick and pass" is a popular game mechanic where each round players select a card or other playable element from their hand and pass the rest to the next player. In this paper, we establish first-principle methods for studying the interpretability, generalizability, and memory of Deep Q-Network (DQN) models playing closed drafting games. In particular, we use a popular family of closed drafting games called "Sushi Go Party", in which we achieve state-of-the-art performance. We fit decision rules to interpret the decision-making strategy of trained DRL agents by comparing them to the ranking preferences of different types of human players. As Sushi Go Party can be expressed as a set of closely-related games based on the set of cards in play, we quantify the generalizability of DRL models trained on various sets of cards, establishing a method to benchmark agent performance as a function of environment unfamiliarity. Using the explicitly calculable memory of other player's hands in closed drafting games, we create measures of the ability of DRL models to learn memory.
翻译:封闭轮抽(又称“选取与传递”)是一种流行的游戏机制:每回合玩家从手牌中选择一张牌或其他可操作元素,并将剩余牌传给下一位玩家。本文建立了第一性原理方法,用于研究执行封闭轮抽游戏的深度Q网络(DQN)模型的可解释性、泛化能力与记忆功能。具体而言,我们以热门封闭轮抽游戏系列“寿司派对”(Sushi Go Party)为研究对象,在该游戏中实现了当前最优性能。通过将训练后的深度强化学习(DRL)代理的决策策略与不同类型人类玩家的排序偏好进行比较,我们拟合决策规则以解释其策略。由于寿司派对可根据游戏中的牌组集合定义为一组密切相关的子游戏,我们量化了在不同牌组集合上训练的DRL模型的泛化能力,建立了一种以环境陌生度为函数来评估代理性能的方法。利用封闭轮抽游戏中对手手牌显式可计算的记忆特性,我们创建了衡量DRL模型习得记忆能力的指标。