On Differentially Private Federated Linear Contextual Bandits

We consider cross-silo federated linear contextual bandit (LCB) problem under differential privacy, where multiple silos (agents) interact with the local users and communicate via a central server to realize collaboration while without sacrificing each user's privacy. We identify three issues in the state-of-the-art: (i) failure of claimed privacy protection and (ii) incorrect regret bound due to noise miscalculation and (iii) ungrounded communication cost. To resolve these issues, we take a two-step principled approach. First, we design an algorithmic framework consisting of a generic federated LCB algorithm and flexible privacy protocols. Then, leveraging the proposed framework, we study federated LCBs under two different privacy constraints. We first establish privacy and regret guarantees under silo-level local differential privacy, which fix the issues present in state-of-the-art algorithm. To further improve the regret performance, we next consider shuffle model of differential privacy, under which we show that our algorithm can achieve nearly ``optimal'' regret without a trusted server. We accomplish this via two different schemes -- one relies on a new result on privacy amplification via shuffling for DP mechanisms and another one leverages the integration of a shuffle protocol for vector sum into the tree-based mechanism, both of which might be of independent interest. Finally, we support our theoretical results with numerical evaluations over contextual bandit instances generated from both synthetic and real-life data.

翻译：我们研究了跨数据孤岛的差分隐私联邦线性上下文强盗（LCB）问题，其中多个数据孤岛（智能体）与本地用户交互，并通过中央服务器进行通信以实现协作，同时不牺牲每个用户的隐私。我们指出了现有技术中的三个问题：(i) 隐私保护声称的失败，(ii) 因噪声计算错误导致的不正确遗憾界，以及(iii) 无根据的通信成本。为解决这些问题，我们采用了两步式原则性方法。首先，我们设计了一个包含通用联邦LCB算法和灵活隐私协议的算法框架。然后，借助所提出的框架，我们研究了两种不同隐私约束下的联邦LCB问题。我们首先在数据岛级本地差分隐私下确立了隐私和遗憾保证，从而修复了现有算法中的问题。为进一步提升遗憾性能，我们接下来考虑了差分隐私的混洗模型，在此模型下，我们证明算法能够在无可信服务器的情况下实现近乎“最优”的遗憾。我们通过两种不同方案达成这一目标——一种依赖于混洗机制对差分隐私机制进行隐私放大的新结果，另一种将向量和的混洗协议集成到基于树的机制中，这两种方案可能具有独立的研究价值。最后，我们通过在合成数据和真实数据生成的上下文强盗实例上的数值评估，支持了理论结果。