In this work we first show that the classical Thompson sampling algorithm for multi-arm bandits is differentially private as-is, without any modification. We provide per-round privacy guarantees as a function of problem parameters and show composition over $T$ rounds; since the algorithm is unchanged, existing $O(\sqrt{NT\log N})$ regret bounds still hold and there is no loss in performance due to privacy. We then show that simple modifications -- such as pre-pulling all arms a fixed number of times, increasing the sampling variance -- can provide tighter privacy guarantees. We again provide privacy guarantees that now depend on the new parameters introduced in the modification, which allows the analyst to tune the privacy guarantee as desired. We also provide a novel regret analysis for this new algorithm, and show how the new parameters also impact expected regret. Finally, we empirically validate and illustrate our theoretical findings in two parameter regimes and demonstrate that tuning the new parameters substantially improve the privacy-regret tradeoff.
翻译:在本研究中,我们首先证明经典的多臂老虎机Thompson Sampling算法无需任何修改即满足差分隐私特性。我们给出了每轮隐私保证与问题参数的函数关系,并证明了其在$T$轮中的组合性质;由于算法本身未作改动,现有的$O(\sqrt{NT\log N})$遗憾界仍然成立,且隐私保护不会导致性能损失。随后我们证明,通过简单修改——例如预先拉动所有臂固定次数、增加采样方差等操作——可以获得更严格的隐私保证。我们进一步给出了依赖于修改引入的新参数的隐私保证,这使得分析者能够按需调整隐私保护强度。同时,我们为改进后的算法提出了全新的遗憾分析,并阐明新参数如何影响期望遗憾。最后,我们在两种参数体系下对理论结果进行了实证验证与展示,证明通过调整新参数能显著改善隐私-遗憾权衡关系。