Motivated by privacy concerns in sequential decision-making on sensitive data, we address the challenge of nonparametric contextual multi-armed bandits (MAB) under local differential privacy (LDP). We develop a uniform-confidence-bound-type estimator, showing its minimax optimality supported by a matching minimax lower bound. We further consider the case where auxiliary datasets are available, subject also to (possibly heterogeneous) LDP constraints. Under the widely-used covariate shift framework, we propose a jump-start scheme to effectively utilize the auxiliary data, the minimax optimality of which is further established by a matching lower bound. Comprehensive experiments on both synthetic and real-world datasets validate our theoretical results and underscore the effectiveness of the proposed methods.
翻译:针对敏感数据序列决策中的隐私保护需求,本文研究了局部差分隐私(LDP)约束下的非参数上下文多臂老虎机(MAB)问题。我们提出了一种基于一致置信界的估计器,并通过匹配的极小极大下界证明了其极小极大最优性。进一步考虑存在辅助数据集(同样受可能异质的LDP约束)的情形,在广泛采用的协变量偏移框架下,我们提出了一种跳跃启动方案以有效利用辅助数据,并通过匹配下界确立了该方案的极小极大最优性。在合成数据集和真实数据集上的综合实验验证了理论结果,并证明了所提方法的有效性。