Bandits serve as the theoretical foundation of sequential learning and an algorithmic foundation of modern recommender systems. However, recommender systems often rely on user-sensitive data, making privacy a critical concern. This paper contributes to the understanding of Differential Privacy (DP) in bandits with a trusted centralised decision-maker, and especially the implications of ensuring zero Concentrated Differential Privacy (zCDP). First, we formalise and compare different adaptations of DP to bandits, depending on the considered input and the interaction protocol. Then, we propose three private algorithms, namely AdaC-UCB, AdaC-GOPE and AdaC-OFUL, for three bandit settings, namely finite-armed bandits, linear bandits, and linear contextual bandits. The three algorithms share a generic algorithmic blueprint, i.e. the Gaussian mechanism and adaptive episodes, to ensure a good privacy-utility trade-off. We analyse and upper bound the regret of these three algorithms. Our analysis shows that in all of these settings, the prices of imposing zCDP are (asymptotically) negligible in comparison with the regrets incurred oblivious to privacy. Next, we complement our regret upper bounds with the first minimax lower bounds on the regret of bandits with zCDP. To prove the lower bounds, we elaborate a new proof technique based on couplings and optimal transport. We conclude by experimentally validating our theoretical results for the three different settings of bandits.
翻译:多臂老虎机是序列学习的理论基础,也是现代推荐系统的算法基础。然而,推荐系统常依赖于用户敏感数据,这使得隐私保护成为关键问题。本文深化了对具有可信中央决策者的多臂老虎机中差分隐私的理解,尤其关注确保零集中差分隐私的影响。首先,我们根据输入形式与交互协议,正式定义并比较了差分隐私在老虎机中的不同适应方式。随后,针对有限臂老虎机、线性老虎机及线性情境老虎机三种场景,分别提出三种隐私算法:AdaC-UCB、AdaC-GOPE与AdaC-OFUL。这三种算法共享通用算法蓝图(即高斯机制与自适应轮次)以达成良好的隐私-效用权衡。我们分析并确定了这三种算法的遗憾上界,研究表明:在以上所有场景中,相比于忽略隐私时的遗憾,施加零集中差分隐私的代价(渐进意义上)可忽略不计。接着,我们首次给出零集中差分隐私下老虎机遗憾的极小化最大下界,以补充上界分析。为证明下界,我们提出一种基于耦合与最优输运的新证明方法。最后,通过实验验证了我们针对三种不同老虎机场景的理论结果。