Symmetric Nonnegative Matrix Factorization (SymNMF) is a technique in data analysis and machine learning that approximates a symmetric matrix with a product of a nonnegative, low-rank matrix and its transpose. To design faster and more scalable algorithms for SymNMF we develop two randomized algorithms for its computation. The first algorithm uses randomized matrix sketching to compute an initial low-rank input matrix and proceeds to use this input to rapidly compute a SymNMF. The second algorithm uses randomized leverage score sampling to approximately solve constrained least squares problems. Many successful methods for SymNMF rely on (approximately) solving sequences of constrained least squares problems. We prove theoretically that leverage score sampling can approximately solve nonnegative least squares problems to a chosen accuracy with high probability. Finally we demonstrate that both methods work well in practice by applying them to graph clustering tasks on large real world data sets. These experiments show that our methods approximately maintain solution quality and achieve significant speed ups for both large dense and large sparse problems.
翻译:对称非负矩阵分解(SymNMF)是一种数据分析和机器学习技术,通过非负低秩矩阵与其转置的乘积来近似对称矩阵。为设计更快且更可扩展的SymNMF算法,我们提出了两种随机计算方法。第一种算法利用随机矩阵素描计算初始低秩输入矩阵,并基于该输入快速计算SymNMF。第二种算法采用随机杠杆值采样来近似求解约束最小二乘问题。许多成功的SymNMF方法依赖于(近似)求解一系列约束最小二乘问题。我们从理论上证明,杠杆值采样能够以高概率将非负最小二乘问题近似求解到指定精度。最后,通过将两种方法应用于大规模真实数据集的图聚类任务,验证了它们的实际有效性。实验表明,我们的方法在大规模稠密与稀疏问题上均能近似保持解的质量,并实现显著加速。