In this paper, we explore the susceptibility of the Q-learning algorithm (a classical and widely used reinforcement learning method) to strategic manipulation of sophisticated opponents in games. We quantify how much a strategically sophisticated agent can exploit a naive Q-learner if she knows the opponent's Q-learning algorithm. To this end, we formulate the strategic actor's problem as a Markov decision process (with a continuum state space encompassing all possible Q-values) as if the Q-learning algorithm is the underlying dynamical system. We also present a quantization-based approximation scheme to tackle the continuum state space and analyze its performance both analytically and numerically.
翻译:在本文中,我们探讨了Q学习算法(一种经典且广泛使用的强化学习方法)在面对游戏中老练对手的战略操纵时的脆弱性。我们量化了如果战略老练的参与者知晓对手的Q学习算法,她能在多大程度上利用天真的Q学习者。为此,我们将战略行动者的问题建模为一个马尔可夫决策过程(状态空间为连续统,涵盖所有可能的Q值),仿佛Q学习算法是底层的动力系统。我们还提出了一种基于量化的近似方案来处理连续状态空间,并从分析和数值两个角度评估了其性能。