This article introduces new multiplicative updates for nonnegative matrix factorization with the $\beta$-divergence and sparse regularization of one of the two factors (say, the activation matrix). It is well known that the norm of the other factor (the dictionary matrix) needs to be controlled in order to avoid an ill-posed formulation. Standard practice consists in constraining the columns of the dictionary to have unit norm, which leads to a nontrivial optimization problem. Our approach leverages a reparametrization of the original problem into the optimization of an equivalent scale-invariant objective function. From there, we derive block-descent majorization-minimization algorithms that result in simple multiplicative updates for either $\ell_{1}$-regularization or the more "aggressive" log-regularization. In contrast with other state-of-the-art methods, our algorithms are universal in the sense that they can be applied to any $\beta$-divergence (i.e., any value of $\beta$) and that they come with convergence guarantees. We report numerical comparisons with existing heuristic and Lagrangian methods using various datasets: face images, an audio spectrogram, hyperspectral data, and song play counts. We show that our methods obtain solutions of similar quality at convergence (similar objective values) but with significantly reduced CPU times.
翻译:本文针对带β-散度与稀疏正则化(作用于一个因子,如激活矩阵)的非负矩阵分解,提出了新的乘法更新规则。众所周知,为避免问题不适定,需对另一因子(字典矩阵)的范数加以控制。标准做法是将字典列约束为单位范数,这导致了一个非平凡的优化问题。我们的方法将原问题重新参数化,转化为等效的尺度不变目标函数的优化问题。由此,我们推导出块坐标下降的最大化-最小化算法,该算法针对ℓ₁正则化或更“激进”的对数正则化,生成简单的乘法更新规则。与其他先进方法不同,我们的算法具有普适性:可适用于任意β-散度(即任何β值),且具备收敛保证。我们基于多种数据集(人脸图像、音频频谱图、高光谱数据及歌曲播放计数)报告了与现有启发式方法和拉格朗日方法的数值比较。结果表明,我们的方法在收敛时能获得质量相近的解(相近的目标函数值),但显著降低了CPU计算时间。