This article introduces new multiplicative updates for nonnegative matrix factorization with the $\beta$-divergence and sparse regularization of one of the two factors (say, the activation matrix). It is well known that the norm of the other factor (the dictionary matrix) needs to be controlled in order to avoid an ill-posed formulation. Standard practice consists in constraining the columns of the dictionary to have unit norm, which leads to a nontrivial optimization problem. Our approach leverages a reparametrization of the original problem into the optimization of an equivalent scale-invariant objective function. From there, we derive block-descent majorization-minimization algorithms that result in simple multiplicative updates for either $\ell_{1}$-regularization or the more "aggressive" log-regularization. In contrast with other state-of-the-art methods, our algorithms are universal in the sense that they can be applied to any $\beta$-divergence (i.e., any value of $\beta$) and that they come with convergence guarantees. We report numerical comparisons with existing heuristic and Lagrangian methods using various datasets: face images, an audio spectrogram, hyperspectral data, and song play counts. We show that our methods obtain solutions of similar quality at convergence (similar objective values) but with significantly reduced CPU times.
翻译:本文针对带有β散度及其中一个因子(如激活矩阵)稀疏正则化的非负矩阵分解问题,提出了一种新的乘法更新规则。众所周知,为避免问题不适定性,需控制另一因子(字典矩阵)的范数。标准做法是将字典矩阵的列约束为单位范数,这会导致一个非平凡的优化问题。我们的方法通过将原始问题重新参数化为等价的尺度不变目标函数优化问题。基于此,我们推导出块下降主极小化算法,该算法对ℓ1正则化或更“激进”的对数正则化均可生成简单的乘法更新规则。与其它先进方法相比,我们的算法具有普适性:可适用于任意β散度(即β取任意值),且具有收敛性保证。我们使用多种数据集(人脸图像、音频频谱图、高光谱数据和歌曲播放次数)与现有的启发式方法和拉格朗日方法进行了数值比较。结果表明,我们的方法在收敛时能获得质量相近的解(相近的目标函数值),但显著降低了CPU运行时间。