This paper studies federated linear contextual bandits under the notion of user-level differential privacy (DP). We first introduce a unified federated bandits framework that can accommodate various definitions of DP in the sequential decision-making setting. We then formally introduce user-level central DP (CDP) and local DP (LDP) in the federated bandits framework, and investigate the fundamental trade-offs between the learning regrets and the corresponding DP guarantees in a federated linear contextual bandits model. For CDP, we propose a federated algorithm termed as $\texttt{ROBIN}$ and show that it is near-optimal in terms of the number of clients $M$ and the privacy budget $\varepsilon$ by deriving nearly-matching upper and lower regret bounds when user-level DP is satisfied. For LDP, we obtain several lower bounds, indicating that learning under user-level $(\varepsilon,\delta)$-LDP must suffer a regret blow-up factor at least $\min\{1/\varepsilon,M\}$ or $\min\{1/\sqrt{\varepsilon},\sqrt{M}\}$ under different conditions.
翻译:本文研究了在用户级差分隐私(DP)概念下的联邦线性上下文赌博机问题。我们首先引入了一个统一的联邦赌博机框架,该框架能适应顺序决策场景下多种DP定义。随后,我们在联邦赌博机框架中正式定义了用户级中心差分隐私(CDP)和本地差分隐私(LDP),并深入探究了联邦线性上下文赌博机模型中学习遗憾与相应DP保证之间的基本权衡。针对CDP,我们提出了一种名为$\texttt{ROBIN}$的联邦算法,并通过推导在满足用户级DP时几乎匹配的上下遗憾界,证明该算法在客户端数量$M$和隐私预算$\varepsilon$方面接近最优。针对LDP,我们获得了若干下界,表明在不同条件下,基于用户级$(\varepsilon,\delta)$-LDP的学习必须承受至少$\min\{1/\varepsilon,M\}$或$\min\{1/\sqrt{\varepsilon},\sqrt{M}\}$的遗憾放大因子。