In this paper, we introduce a new methodology to solve the orthogonal nonnegative matrix factorization (ONMF) problem, where the objective is to approximate an input data matrix by a product of two nonnegative matrices, the features matrix and the mixing matrix, where one of them is orthogonal. We show how the ONMF can be interpreted as a specific facility-location problem (FLP), and adapt a maximum-entropy-principle based solution for FLP to the ONMF problem. The proposed approach guarantees orthogonality and sparsity of the features or the mixing matrix, while ensuring nonnegativity of both. Additionally, our methodology develops a quantitative characterization of ``true" number of underlying features - a hyperparameter required for the ONMF. An evaluation of the proposed method conducted on synthetic datasets, as well as a standard genetic microarray dataset indicates significantly better sparsity, orthogonality, and performance speed compared to similar methods in the literature, with comparable or improved reconstruction errors.
翻译:本文提出了一种解决正交非负矩阵分解(ONMF)问题的新方法,其目标是将输入数据矩阵近似分解为两个非负矩阵(特征矩阵和混合矩阵)的乘积,且其中一个矩阵满足正交性。我们展示了ONMF如何被解释为一种特殊的设施选址问题(FLP),并将基于最大熵原理的FLP求解方法适配到ONMF问题中。所提出的方法在确保两个矩阵非负性的同时,保证了特征矩阵或混合矩阵的正交性和稀疏性。此外,我们的方法还发展了对潜在特征"真实"数量的定量刻画——这是ONMF所需的一个超参数。在合成数据集以及标准基因芯片数据集上进行的评估表明,与文献中的类似方法相比,本方法在稀疏性、正交性和运算速度方面具有显著优势,且重构误差相当或更优。