Contextual bandit with linear reward functions is among one of the most extensively studied models in bandit and online learning research. Recently, there has been increasing interest in designing \emph{locally private} linear contextual bandit algorithms, where sensitive information contained in contexts and rewards is protected against leakage to the general public. While the classical linear contextual bandit algorithm admits cumulative regret upper bounds of $\tilde O(\sqrt{T})$ via multiple alternative methods, it has remained open whether such regret bounds are attainable in the presence of local privacy constraints, with the state-of-the-art result being $\tilde O(T^{3/4})$. In this paper, we show that it is indeed possible to achieve an $\tilde O(\sqrt{T})$ regret upper bound for locally private linear contextual bandit. Our solution relies on several new algorithmic and analytical ideas, such as the analysis of mean absolute deviation errors and layered principal component regression in order to achieve small mean absolute deviation errors.
翻译:线性奖励函数的上下文赌博机是赌博机与在线学习研究中最广泛研究的模型之一。近年来,设计面向局部隐私的线性上下文赌博机算法日益受到关注,此类算法可防止上下文与奖励中包含的敏感信息泄露给公众。经典线性上下文赌博机算法通过多种替代方法可实现$\tilde O(\sqrt{T})$的累积遗憾上界,但面对局部隐私约束时,能否达到此类遗憾界仍是未解难题——目前最优结果仅为$\tilde O(T^{3/4})$。本文证明,局部隐私线性上下文赌博机确实可实现$\tilde O(\sqrt{T})$的遗憾上界。我们的解决方案依赖多项新型算法与分析技术,例如通过平均绝对偏差分析与分层主成分回归来实现较小的平均绝对偏差误差。