An extrapolated and provably convergent algorithm for nonlinear matrix decomposition with the ReLU function

ReLU matrix decomposition (RMD) is the following problem: given a sparse, nonnegative matrix $X$ and a factorization rank $r$, identify a rank-$r$ matrix $Θ$ such that $X\approx \max(0,Θ)$. RMD is a particular instance of nonlinear matrix decomposition (NMD) that finds application in data compression, matrix completion with entries missing not at random, and manifold learning. The standard RMD model minimizes the least squares error, that is, $\|X - \max(0,Θ)\|_F^2$. The corresponding optimization problem, Least-Squares RMD (LS-RMD), is nondifferentiable and highly nonconvex. This motivated Saul to propose an alternative model, \revise{dubbed Latent-RMD}, where a latent variable $Z$ is introduced and satisfies $\max(0,Z)=X$ while minimizing $\|Z - Θ\|_F^2$ (``A nonlinear matrix decomposition for mining the zeros of sparse data'', SIAM J.\ Math.\ Data Sci., 2022). Our first contribution is to show that the two formulations may yield different low-rank solutions $Θ$. We then consider a reparametrization of the Latent-RMD, called 3B-RMD, in which $Θ$ is substituted by a low-rank product $WH$, where $W$ has $r$ columns and $H$ has $r$ rows. Our second contribution is to prove the convergence of a block coordinate descent (BCD) approach applied to 3B-RMD. Our third contribution is a novel extrapolated variant of BCD, dubbed eBCD, which we prove is also convergent under mild assumptions. We illustrate the significant acceleration effect of eBCD compared to eBCD, and also show that eBCD performs well against the state of the art on synthetic and real-world data sets.

翻译：ReLU矩阵分解（RMD）问题如下：给定一个稀疏非负矩阵$X$和分解秩$r$，识别一个秩为$r$的矩阵$Θ$，使得$X\approx \max(0,Θ)$。RMD是非线性矩阵分解（NMD）的一个特例，在数据压缩、非随机缺失条目的矩阵补全以及流形学习中具有应用。标准RMD模型最小化最小二乘误差，即$\|X - \max(0,Θ)\|_F^2$。相应的优化问题，即最小二乘RMD（LS-RMD），是不可微且高度非凸的。这促使Saul提出了一种替代模型（称为Latent-RMD），其中引入一个潜变量$Z$，满足$\max(0,Z)=X$，同时最小化$\|Z - Θ\|_F^2$（"A nonlinear matrix decomposition for mining the zeros of sparse data", SIAM J. Math. Data Sci., 2022）。我们的第一个贡献是证明这两种公式可能产生不同的低秩解$Θ$。接着，我们考虑Latent-RMD的一种重参数化，称为3B-RMD，其中$Θ$被替换为低秩乘积$WH$，其中$W$有$r$列，$H$有$r$行。我们的第二个贡献是证明了应用于3B-RMD的块坐标下降（BCD）方法的收敛性。我们的第三个贡献是一种新颖的外推BCD变体，称为eBCD，我们证明其在温和假设下也是收敛的。我们展示了eBCD相比BCD的显著加速效果，并且还表明eBCD在合成和真实世界数据集上相对于现有技术水平表现良好。