We study the performance guarantees of exploration-free greedy algorithms for the linear contextual bandit problem. We introduce a novel condition, named the \textit{Local Anti-Concentration} (LAC) condition, which enables a greedy bandit algorithm to achieve provable efficiency. We show that the LAC condition is satisfied by a broad class of distributions, including Gaussian, exponential, uniform, Cauchy, and Student's~$t$ distributions, along with other exponential family distributions and their truncated variants. This significantly expands the class of distributions under which greedy algorithms can perform efficiently. Under our proposed LAC condition, we prove that the cumulative expected regret of the greedy algorithm for the linear contextual bandit is bounded by $O(\operatorname{poly} \log T)$. Our results establish the widest range of distributions known to date that allow a sublinear regret bound for greedy algorithms, further achieving a sharp poly-logarithmic regret.
翻译:本文研究了线性上下文赌博机问题中免探索贪婪算法的性能保证。我们引入了一种称为"局部反集中"条件的新条件,该条件使贪婪赌博机算法能够实现可证明的效率。我们证明LAC条件被广泛分布的类别所满足,包括高斯分布、指数分布、均匀分布、柯西分布和学生t分布,以及其他指数族分布及其截断变体。这显著扩展了贪婪算法能够高效运行的分布类别。在我们提出的LAC条件下,我们证明了线性上下文赌博机的贪婪算法的累积期望遗憾上界为$O(\operatorname{poly} \log T)$。我们的结果确立了迄今为止已知的允许贪婪算法获得次线性遗憾界的最大分布范围,并进一步实现了精确的多对数遗憾。