We present the first algorithms for generalized linear contextual bandits under shuffle differential privacy and joint differential privacy. While prior work on private contextual bandits has been restricted to linear reward models -- which admit closed-form estimators -- generalized linear models (GLMs) pose fundamental new challenges: no closed-form estimator exists, requiring private convex optimization; privacy must be tracked across multiple evolving design matrices; and optimization error must be explicitly incorporated into regret analysis. We address these challenges under two privacy models and context settings. For stochastic contexts, we design a shuffle-DP algorithm achieving $\tilde{O}(d^{3/2}\sqrt{T}/\sqrt{\varepsilon})$ regret. For adversarial contexts, we provide a joint-DP algorithm with $\tilde{O}(d\sqrt{T}/\sqrt{\varepsilon})$ regret -- matching the non-private rate up to a $1/\sqrt{\varepsilon}$ factor. Both algorithms remove dependence on the instance-specific parameter $κ$ (which can be exponential in dimension) from the dominant $\sqrt{T}$ term. Unlike prior work on locally private GLM bandits, our methods require no spectral assumptions on the context distribution beyond $\ell_2$ boundedness.
翻译:我们提出了在洗牌差分隐私和联合差分隐私下的广义线性上下文赌博机的首个算法。尽管先前关于私有上下文赌博机的研究仅限于线性奖励模型(这类模型允许闭式估计器),广义线性模型带来了根本性的新挑战:不存在闭式估计器,需要私有凸优化;隐私保护必须在多个演化的设计矩阵中进行追踪;且优化误差必须明确纳入遗憾分析。我们在两种隐私模型和上下文设置下解决了这些挑战。对于随机上下文,我们设计了一种洗牌差分隐私算法,实现了$\tilde{O}(d^{3/2}\sqrt{T}/\sqrt{\varepsilon})$的遗憾。对于对抗性上下文,我们提供了一种联合差分隐私算法,其遗憾为$\tilde{O}(d\sqrt{T}/\sqrt{\varepsilon})$——与非私有情形的最优速率相比仅相差一个$1/\sqrt{\varepsilon}$因子。两种算法均消除了主导项$\sqrt{T}$对实例特定参数$κ$(该参数在维度上可能呈指数增长)的依赖。与先前关于局部私有广义线性模型赌博机的研究不同,我们的方法除了要求上下文分布的$\ell_2$有界性外,无需任何谱假设。