In sparse linear bandits, a learning agent sequentially selects an action and receive reward feedback, and the reward function depends linearly on a few coordinates of the covariates of the actions. This has applications in many real-world sequential decision making problems. In this paper, we propose a simple and computationally efficient sparse linear estimation method called PopArt that enjoys a tighter $\ell_1$ recovery guarantee compared to Lasso (Tibshirani, 1996) in many problems. Our bound naturally motivates an experimental design criterion that is convex and thus computationally efficient to solve. Based on our novel estimator and design criterion, we derive sparse linear bandit algorithms that enjoy improved regret upper bounds upon the state of the art (Hao et al., 2020), especially w.r.t. the geometry of the given action set. Finally, we prove a matching lower bound for sparse linear bandits in the data-poor regime, which closes the gap between upper and lower bounds in prior work.
翻译:在稀疏线性赌博机中,学习代理按序选择动作并接收奖励反馈,其奖励函数线性依赖于动作协变量的少数坐标。这一问题广泛存在于众多实际序贯决策场景中。本文提出一种名为PopArt的简单且计算高效的稀疏线性估计方法,相较于Lasso(Tibshirani, 1996),该方法在多项问题中享有更紧致的$\ell_1$恢复保证。该界自然引出一个凸性实验设计准则,从而在计算上可高效求解。基于本文提出的新型估计量与设计准则,我们推导出稀疏线性赌博机算法,相较于现有最优方法(Hao等, 2020),该算法在遗憾上界方面取得改进,特别是在给定动作集的几何特性维度上。最后,我们证明了数据匮乏场景下稀疏线性赌博机的匹配下界,填补了先前工作中上下界之间的理论空白。