Linear contextual bandits, especially LinUCB, are widely used in recommender systems. However, its training, inference, and memory costs grow with feature dimensionality and the size of the action space. The key bottleneck becomes the need to update, invert and store a design matrix that absorbs contextual information from interaction history. In this paper, we introduce Scalable LinUCB, the algorithm that enables fast and memory efficient operations with the inverse regularized design matrix. We achieve this through a dynamical low-rank parametrization of its inverse Cholesky-style factors. We derive numerically stable rank-1 and batched updates that maintain the inverse without directly forming the entire matrix. To control memory growth, we employ a projector-splitting integrator for dynamical low-rank approximation, yielding average per-step update cost $O(dr)$ and memory $O(dr)$ for approximation rank $r$. Inference complexity of the suggested algorithm is $O(dr)$ per action evaluation. Experiments on recommender system datasets demonstrate the effectiveness of our algorithm.
翻译:线性上下文赌博机,特别是LinUCB,在推荐系统中被广泛应用。然而,其训练、推理和存储成本会随着特征维度和动作空间规模的增大而增加。关键瓶颈在于需要更新、求逆并存储一个吸收交互历史中上下文信息的设计矩阵。本文提出可扩展LinUCB算法,该算法能够实现对逆正则化设计矩阵的快速且内存高效的操作。我们通过对其逆Cholesky式因子进行动态低秩参数化来实现这一目标。我们推导了数值稳定的秩-1更新和批量更新方法,这些方法无需直接构建整个矩阵即可维护逆矩阵。为控制内存增长,我们采用投影-分裂积分器进行动态低秩近似,对于近似秩$r$,平均每步更新成本为$O(dr)$,内存占用为$O(dr)$。所提算法的推理复杂度为每次动作评估$O(dr)$。在推荐系统数据集上的实验验证了我们算法的有效性。