Robust Reinforcement Learning (RRL) is a promising Reinforcement Learning (RL) paradigm aimed at training robust to uncertainty or disturbances models, making them more efficient for real-world applications. Following this paradigm, uncertainty or disturbances are interpreted as actions of a second adversarial agent, and thus, the problem is reduced to seeking the agents' policies robust to any opponent's actions. This paper is the first to propose considering the RRL problems within the positional differential game theory, which helps us to obtain theoretically justified intuition to develop a centralized Q-learning approach. Namely, we prove that under Isaacs's condition (sufficiently general for real-world dynamical systems), the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations. Based on these results, we present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.
翻译:鲁棒强化学习(RRL)是一种前景广阔的强化学习(RL)范式,旨在训练对不确定性或干扰具有鲁棒性的模型,从而使其更适用于实际应用。遵循该范式,不确定性或干扰被视为第二个对抗性智能体的动作,因此问题简化为寻求对所有对手动作具有鲁棒性的智能体策略。本文首次提出在位置微分博弈理论的框架下考虑RRL问题,这有助于我们获得理论上有依据的直觉,从而开发集中式Q学习方法。具体而言,我们证明在Isaacs条件(对实际动力系统具有足够通用性)下,相同的Q函数可同时作为极小极大和极大极小贝尔曼方程的近似解。基于这些结果,我们提出了Isaacs深度Q网络算法,并在多种环境中证明了其相较于其他基线RRL和多智能体RL算法的优越性。