We present the first algorithms for generalized linear contextual bandits under shuffle differential privacy and joint differential privacy. While prior work on private contextual bandits has been restricted to linear reward models -- which admit closed-form estimators -- generalized linear models (GLMs) pose fundamental new challenges: no closed-form estimator exists, requiring private convex optimization; privacy must be tracked across multiple evolving design matrices; and optimization error must be explicitly incorporated into regret analysis. We address these challenges under two privacy models and context settings. For stochastic contexts, we design a shuffle-DP algorithm achieving $\tilde{O}(d^{3/2}\sqrt{T \log T}/\sqrt{\varepsilon})$ regret in dominant term, differing from the non-private rate by a factor of $\sqrt{d/\varepsilon}$. For adversarial contexts, we provide a joint-DP algorithm with regret $\tilde{O}\!\big(d\sqrt{T} \log T + d^{3/4}\sqrt{T/\varepsilon}\,(\log T)\,(d + \log T)^{1/4}\big)$ -- matching the non-private rate $\tilde{O}(d\sqrt{T} \log T)$ in the leading term, with privacy contributing only an additive correction. Unlike prior work on locally private GLM bandits, our methods require no spectral assumptions on the context distribution beyond $\ell_2$ boundedness.
翻译:我们提出了在混洗差分隐私和联合差分隐私下针对广义线性情景赌博机的首批算法。尽管先前关于私有情景赌博机的工作仅限于线性奖励模型(该模型允许闭式估计),但广义线性模型(GLMs)带来了根本性的新挑战:不存在闭式估计,需要私有凸优化;隐私必须在多个不断变化的设计矩阵中进行追踪;并且优化误差必须显式地纳入遗憾分析。我们在两种隐私模型和情景设置下解决了这些挑战。对于随机情景,我们设计了一种混洗差分隐私算法,其主导项遗憾为$\tilde{O}(d^{3/2}\sqrt{T \log T}/\sqrt{\varepsilon})$,与非私有率相差$\sqrt{d/\varepsilon}$因子。对于对抗性情景,我们提供了一种联合差分隐私算法,遗憾为$\tilde{O}\!\big(d\sqrt{T} \log T + d^{3/4}\sqrt{T/\varepsilon}\,(\log T)\,(d + \log T)^{1/4}\big)$——在主项上匹配非私有率$\tilde{O}(d\sqrt{T} \log T)$,而隐私仅贡献一个加法修正项。与先前关于局部私有GLM赌博机的工作不同,我们的方法除了$\ell_2$有界性之外,不需要对情景分布的谱假设。