We study private and robust multi-armed bandits (MABs), where the agent receives Huber's contaminated heavy-tailed rewards and meanwhile needs to ensure differential privacy. We first present its minimax lower bound, characterizing the information-theoretic limit of regret with respect to privacy budget, contamination level and heavy-tailedness. Then, we propose a meta-algorithm that builds on a private and robust mean estimation sub-routine \texttt{PRM} that essentially relies on reward truncation and the Laplace mechanism only. For two different heavy-tailed settings, we give specific schemes of \texttt{PRM}, which enable us to achieve nearly-optimal regret. As by-products of our main results, we also give the first minimax lower bound for private heavy-tailed MABs (i.e., without contamination). Moreover, our two proposed truncation-based \texttt{PRM} achieve the optimal trade-off between estimation accuracy, privacy and robustness. Finally, we support our theoretical results with experimental studies.
翻译:我们研究私有且鲁棒的多臂赌博机(MABs),其中智能体接收Huber污染的重尾奖励,同时需要保证差分隐私。首先,我们给出了其极小化最大下界,刻画了隐私预算、污染水平和重尾性对遗憾信息论极限的影响。接着,我们提出了一种元算法,该算法基于私有且鲁棒的均值估计子程序\texttt{PRM},该子程序本质上仅依赖于奖励截断和拉普拉斯机制。针对两种不同的重尾设定,我们给出了\texttt{PRM}的具体方案,从而实现了近乎最优的遗憾。作为主要结果的副产品,我们还首次给出了私有重尾MABs(即无污染情形)的极小化最大下界。此外,我们提出的两种基于截断的\texttt{PRM}实现了估计精度、隐私和鲁棒性之间的最优权衡。最后,我们通过实验研究支持了理论结果。