Online Low Rank Matrix Completion

We study the problem of {\em online} low-rank matrix completion with $\mathsf{M}$ users, $\mathsf{N}$ items and $\mathsf{T}$ rounds. In each round, the algorithm recommends one item per user, for which it gets a (noisy) reward sampled from a low-rank user-item preference matrix. The goal is to design a method with sub-linear regret (in $\mathsf{T}$) and nearly optimal dependence on $\mathsf{M}$ and $\mathsf{N}$. The problem can be easily mapped to the standard multi-armed bandit problem where each item is an {\em independent} arm, but that leads to poor regret as the correlation between arms and users is not exploited. On the other hand, exploiting the low-rank structure of reward matrix is challenging due to non-convexity of the low-rank manifold. We first demonstrate that the low-rank structure can be exploited using a simple explore-then-commit (ETC) approach that ensures a regret of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{2/3})$. That is, roughly only $\mathsf{polylog} (\mathsf{M}+\mathsf{N})$ item recommendations are required per user to get a non-trivial solution. We then improve our result for the rank-$1$ setting which in itself is quite challenging and encapsulates some of the key issues. Here, we propose \textsc{OCTAL} (Online Collaborative filTering using iterAtive user cLustering) that guarantees nearly optimal regret of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{1/2})$. OCTAL is based on a novel technique of clustering users that allows iterative elimination of items and leads to a nearly optimal minimax rate.

翻译：我们研究了在线低秩矩阵补全问题，涉及$\mathsf{M}$个用户、$\mathsf{N}$个项目和$\mathsf{T}$轮交互。在每一轮中，算法为每个用户推荐一个项目，并从中获得一个（含噪声的）奖励，该奖励由低秩的用户-项目偏好矩阵采样得到。目标是设计一种方法，使其遗憾（关于$\mathsf{T}$）呈次线性，并在$\mathsf{M}$和$\mathsf{N}$上具有近乎最优的依赖关系。该问题可轻松映射到标准多臂老虎机问题，其中每个项目是一个独立臂，但这种方式会因未利用臂与用户之间的相关性而导致较高的遗憾。另一方面，利用奖励矩阵的低秩结构具有挑战性，因为低秩流形是非凸的。我们首先证明，可以通过简单的"先探索后提交"（ETC）方法来利用低秩结构，该方法保证遗憾为$O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{2/3})$。也就是说，每个用户只需约$\mathsf{polylog} (\mathsf{M}+\mathsf{N})$次项目推荐即可获得非平凡解。随后，我们针对秩为$1$的情况改进了结果——这本身极具挑战性，并包含了问题的关键难点。在此情况下，我们提出了OCTAL（基于迭代用户聚类的在线协同过滤），它保证了近乎最优的遗憾$O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{1/2})$。OCTAL基于一种新颖的用户聚类技术，能够迭代消除项目，并达到近乎最优的极小化极大速率。