We study the problem of multi-agent reinforcement learning (multi-agent RL) with differential privacy (DP) constraints. This is well-motivated by various real-world applications involving sensitive data, where it is critical to protect users' private information. We first extend the definitions of Joint DP (JDP) and Local DP (LDP) to two-player zero-sum episodic Markov Games, where both definitions ensure trajectory-wise privacy protection. Then we design a provably efficient algorithm based on optimistic Nash value iteration and privatization of Bernstein-type bonuses. The algorithm is able to satisfy JDP and LDP requirements when instantiated with appropriate privacy mechanisms. Furthermore, for both notions of DP, our regret bound generalizes the best known result under the single-agent RL case, while our regret could also reduce to the best known result for multi-agent RL without privacy constraints. To the best of our knowledge, these are the first line of results towards understanding trajectory-wise privacy protection in multi-agent RL.
翻译:我们研究了在差分隐私约束下的多智能体强化学习问题。这一问题由涉及敏感数据的各类实际应用所驱动,保护用户隐私信息在其中至关重要。我们首先将联合差分隐私和局部差分隐私的定义扩展至两人零和情节马尔可夫博弈,这两种定义均能确保轨迹级隐私保护。随后,我们设计了一种基于乐观纳什值迭代和伯恩斯坦型奖励私有化的可证明高效算法。当采用适当的隐私机制实例化时,该算法能够满足联合差分隐私和局部差分隐私的要求。此外,对于两种隐私定义,我们的遗憾界将单智能体强化学习场景下的已知最优结果进行了推广,同时该遗憾界在无隐私约束的多智能体强化学习场景下可简化为已知最优结果。据我们所知,这是理解多智能体强化学习中轨迹级隐私保护的首批研究成果。