In this paper, we study the stochastic linear bandit problem under the additional requirements of differential privacy, robustness and batched observations. In particular, we assume an adversary randomly chooses a constant fraction of the observed rewards in each batch, replacing them with arbitrary numbers. We present differentially private and robust variants of the arm elimination algorithm using logarithmic batch queries under two privacy models and provide regret bounds in both settings. In the first model, every reward in each round is reported by a potentially different client, which reduces to standard local differential privacy (LDP). In the second model, every action is "owned" by a different client, who may aggregate the rewards over multiple queries and privatize the aggregate response instead. To the best of our knowledge, our algorithms are the first simultaneously providing differential privacy and adversarial robustness in the stochastic linear bandits problem.
翻译:本文研究在差分隐私、鲁棒性以及批量观测等附加要求下的随机线性赌博机问题。具体而言,我们假设在每个批次中,对手随机选择恒定比例的观测奖励,并用任意数值替换这些奖励。我们提出了两种隐私模型下采用对数批量查询的臂消除算法的差分隐私与鲁棒变体,并给出了这两种设置下的遗憾界。在第一种模型中,每一轮的每个奖励可能由不同的客户端报告,这简化为标准局部差分隐私(LDP)。在第二种模型中,每个动作由不同的客户端“拥有”,该客户端可聚合多次查询的奖励,并对聚合响应进行隐私化处理。据我们所知,我们的算法是首个在随机线性赌博机问题中同时提供差分隐私与对抗鲁棒性的方案。