In this paper, we introduce PSI-LinUCB, a scalable variant of LinUCB that enables efficient training, inference, and memory usage by representing the inverse regularized design matrix as a sum of a diagonal matrix and low-rank correction. We derive numerically stable rank-1 and batched updates that maintain the inverse without explicitly forming the matrix. To control memory growth, we employ a projector-splitting integrator for dynamical low-rank approximation, yielding an average per-step update cost and memory usage of $O(dr)$ for approximation rank $r$. The inference complexity of the proposed algorithm is $O(dr)$ per action evaluation. Experiments on recommender system datasets demonstrate the effectiveness of our algorithm.
翻译:本文提出PSI-LinUCB,一种可扩展的LinUCB变体,通过将逆正则化设计矩阵表示为对角矩阵与低秩修正项之和,实现了高效的训练、推理和内存使用。我们推导了数值稳定的秩1更新和批量更新方法,可在不显式构建矩阵的情况下维持逆矩阵的更新。为控制内存增长,我们采用投影分裂积分器进行动态低秩近似,在近似秩为$r$时实现平均每步$O(dr)$的更新成本和内存占用。所提算法的推理复杂度为每次动作评估$O(dr)$。在推荐系统数据集上的实验验证了算法的有效性。