Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown $d_1$ by $d_2$ matrix $\Theta^*$ with rank $r \ll \{d_1, d_2\}$, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that G-ESTT can achieve the $\tilde{O}(\sqrt{(d_1+d_2)MrT})$ bound of regret while G-ESTS can achineve the $\tilde{O}(\sqrt{(d_1+d_2)^{3/2}Mr^{3/2}T})$ bound of regret under mild assumption up to logarithm terms, where $M$ is some problem dependent value. Under a reasonable assumption that $M = O((d_1+d_2)^2)$ in our problem setting, the regret of G-ESTT is consistent with the current best regret of $\tilde{O}((d_1+d_2)^{3/2} \sqrt{rT}/D_{rr})$~\citep{lu2021low} ($D_{rr}$ will be defined later). For completeness, we conduct experiments to illustrate that our proposed algorithms, especially G-ESTS, are also computationally tractable and consistently outperform other state-of-the-art (generalized) linear matrix bandit methods based on a suite of simulations.

翻译：在随机上下文低秩矩阵赌博机问题中，每个动作的期望奖励由该动作的特征矩阵与某个固定但初始未知的秩为$r \ll \{d_1, d_2\}$的$d_1 \times d_2$矩阵$\Theta^*$的内积给出，智能体根据历史经验顺序采取行动以最大化累积奖励。本文研究近期在广义线性模型框架下提出的广义低秩矩阵赌博机问题（见\cite{lu2021low}）。为克服现有算法在该问题上的计算不可行性与理论限制，我们首先提出G-ESTT框架，该框架通过引入Stein方法改进子空间估计（借鉴\cite{jun2019bilinear}思想），并利用正则化思想对估计子空间进行有效利用。进一步，我们通过创新性地在估计子空间上采用排除思想显著提升G-ESTT的效率，并提出G-ESTS框架。我们证明，在温和假设下（忽略对数项），G-ESTT可实现$\tilde{O}(\sqrt{(d_1+d_2)MrT})$的遗憾界，而G-ESTS可实现$\tilde{O}(\sqrt{(d_1+d_2)^{3/2}Mr^{3/2}T})$的遗憾界，其中$M$为问题相关参数。在问题设定下合理假设$M = O((d_1+d_2)^2)$时，G-ESTT的遗憾与当前最优界$\tilde{O}((d_1+d_2)^{3/2} \sqrt{rT}/D_{rr})$~\citep{lu2021low}（$D_{rr}$后续定义）一致。为完整性，我们通过模拟实验表明，所提出算法（特别是G-ESTS）在计算上具有可行性，且性能持续优于一系列最新的（广义）线性矩阵赌博机方法。