In this paper, we propose a new low-rank matrix factorization model dubbed bounded simplex-structured matrix factorization (BSSMF). Given an input matrix $X$ and a factorization rank $r$, BSSMF looks for a matrix $W$ with $r$ columns and a matrix $H$ with $r$ rows such that $X \approx WH$ where the entries in each column of $W$ are bounded, that is, they belong to given intervals, and the columns of $H$ belong to the probability simplex, that is, $H$ is column stochastic. BSSMF generalizes nonnegative matrix factorization (NMF), and simplex-structured matrix factorization (SSMF). BSSMF is particularly well suited when the entries of the input matrix $X$ belong to a given interval; for example when the rows of $X$ represent images, or $X$ is a rating matrix such as in the Netflix and MovieLens datasets where the entries of $X$ belong to the interval $[1,5]$. The simplex-structured matrix $H$ not only leads to an easily understandable decomposition providing a soft clustering of the columns of $X$, but implies that the entries of each column of $WH$ belong to the same intervals as the columns of $W$. In this paper, we first propose a fast algorithm for BSSMF, even in the presence of missing data in $X$. Then we provide identifiability conditions for BSSMF, that is, we provide conditions under which BSSMF admits a unique decomposition, up to trivial ambiguities. Finally, we illustrate the effectiveness of BSSMF on two applications: extraction of features in a set of images, and the matrix completion problem for recommender systems.
翻译:本文提出一种名为有界单纯形结构矩阵分解(BSSMF)的新型低秩矩阵分解模型。给定输入矩阵$X$和分解秩$r$,BSSMF旨在寻找具有$r$列的矩阵$W$和具有$r$行的矩阵$H$,使得$X \approx WH$,其中$W$每列的元素有界,即它们属于给定区间,而$H$的列属于概率单纯形,即$H$为列随机矩阵。BSSMF推广了非负矩阵分解(NMF)和单纯形结构矩阵分解(SSMF)。当输入矩阵$X$的元素属于给定区间时,BSSMF尤为适用;例如$X$的行代表图像,或$X$为评分矩阵(如Netflix和MovieLens数据集中$X$的元素属于区间$[1,5]$)。具有单纯形结构的矩阵$H$不仅提供了易于理解的分解结果,实现了对$X$列的软聚类,还隐含$WH$每列的元素与$W$列同属于相同区间。本文首先提出BSSMF的快速算法,即使$X$存在缺失数据时仍可有效求解。随后给出BSSMF的可辨识性条件,即该分解在忽略平凡模糊性时具有唯一解的条件。最后,我们通过两个应用实例验证BSSMF的有效性:图像集特征提取和推荐系统的矩阵补全问题。