This paper investigates the problem of regret minimization for multi-armed bandit (MAB) problems with local differential privacy (LDP) guarantee. Given a fixed privacy budget $\epsilon$, we consider three privatizing mechanisms under Bernoulli scenario: linear, quadratic and exponential mechanisms. Under each mechanism, we derive stochastic regret bound for Thompson Sampling algorithm. Finally, we simulate to illustrate the convergence of different mechanisms under different privacy budgets.
翻译:本文研究具有本地差分隐私保证的多臂老虎机问题中遗憾最小化问题。给定固定隐私预算 $\epsilon$,我们考虑伯努利场景下的三种私有化机制:线性机制、二次机制和指数机制。对于每种机制,我们推导了汤普森采样算法的随机遗憾界。最后,我们通过仿真实验展示了不同隐私预算下各机制的收敛性。