Nonnegative matrix factorization (NMF) is a popular data embedding technique. Given a nonnegative data matrix $X$, it aims at finding two lower dimensional matrices, $W$ and $H$, such that $X\approx WH$, where the factors $W$ and $H$ are constrained to be element-wise nonnegative. The factor $W$ serves as a basis for the columns of $X$. In order to obtain more interpretable and unique solutions, minimum-volume NMF (MinVol NMF) minimizes the volume of $W$. In this paper, we consider the dual approach, where the volume of $H$ is maximized instead; this is referred to as maximum-volume NMF (MaxVol NMF). MaxVol NMF is identifiable under the same conditions as MinVol NMF in the noiseless case, but it behaves rather differently in the presence of noise. In practice, MaxVol NMF is much more effective to extract a sparse decomposition and does not generate rank-deficient solutions. In fact, we prove that the solutions of MaxVol NMF with the largest volume correspond to clustering the columns of $X$ in disjoint clusters, while the solutions of MinVol NMF with smallest volume are rank deficient. We propose two algorithms to solve MaxVol NMF. We also present a normalized variant of MaxVol NMF that exhibits better performance than MinVol NMF and MaxVol NMF, and can be interpreted as a continuum between standard NMF and orthogonal NMF. We illustrate our results in the context of hyperspectral unmixing.
翻译:非负矩阵分解(NMF)是一种流行的数据嵌入技术。给定一个非负数据矩阵 $X$,其目标在于找到两个较低维度的矩阵 $W$ 和 $H$,使得 $X\approx WH$,其中因子 $W$ 和 $H$ 被约束为逐元素非负。因子 $W$ 充当 $X$ 列向量的基。为了获得更具可解释性和唯一性的解,最小体积 NMF(MinVol NMF)最小化 $W$ 的体积。在本文中,我们考虑对偶方法,转而最大化 $H$ 的体积;这被称为最大体积 NMF(MaxVol NMF)。在无噪声情况下,MaxVol NMF 在与 MinVol NMF 相同的条件下是可识别的,但在存在噪声时其行为却相当不同。实际上,MaxVol NMF 在提取稀疏分解方面更为有效,并且不会产生秩亏解。事实上,我们证明了具有最大体积的 MaxVol NMF 解对应于将 $X$ 的列向量划分到不相交的簇中,而具有最小体积的 MinVol NMF 解是秩亏的。我们提出了两种算法来求解 MaxVol NMF。我们还提出了一种 MaxVol NMF 的归一化变体,其性能优于 MinVol NMF 和 MaxVol NMF,并且可以解释为标准 NMF 与正交 NMF 之间的连续体。我们在高光谱解混的背景下展示了我们的结果。