We study the benefits of sparsity in nonparametric contextual bandit problems, in which the set of candidate features is countably or uncountably infinite. Our contribution is two-fold. First, using a novel reduction to sequences of multi-armed bandit problems, we provide lower bounds on the minimax regret, which show that polynomial dependence on the number of actions is generally unavoidable in this setting. Second, we show that a variant of the Feel-Good Thompson Sampling algorithm enjoys regret bounds that match our lower bounds up to logarithmic factors of the horizon, and have logarithmic dependence on the effective number of candidate features. When we apply our results to kernelised and neural contextual bandits, we find that sparsity enables better regret bounds whenever the horizon is large enough relative to the sparsity and the number of actions.
翻译:本文研究了在非参数上下文赌博机问题中稀疏性带来的优势,其中候选特征集为可数或不可数无限集。我们的贡献包括两个方面。首先,通过一种新颖的归约方法将问题转化为多臂赌博机序列,我们给出了极小极大遗憾的下界,证明在此设定下遗憾对动作数量的多项式依赖通常是不可避免的。其次,我们证明改进版的Feel-Good Thompson Sampling算法能够达到与下界匹配的遗憾界(仅相差时间范围的若干对数因子),且其遗憾仅与候选特征有效数量的对数相关。当我们将研究结果应用于核化与神经网络上下文赌博机时发现:只要时间范围相对于稀疏度和动作数量足够大,稀疏性就能带来更优的遗憾界。