We propose a flexible and theoretically supported framework for scalable nonnegative matrix factorization. The goal is to find nonnegative low-rank components directly from compressed measurements, accessing the original data only once or twice. We consider compression through randomized sketching methods that can be adapted to the data, or can be oblivious. We formulate optimization problems that only depend on the compressed data, but which can recover a nonnegative factorization which closely approximates the original matrix. The defined problems can be approached with a variety of algorithms, and in particular, we discuss variations of the popular multiplicative updates method for these compressed problems. We demonstrate the success of our approaches empirically and validate their performance in real-world applications.
翻译:我们提出了一种灵活且具有理论支撑的可扩展非负矩阵分解框架。该框架的目标是直接从压缩测量中寻找非负低秩分量,仅需访问原始数据一到两次。我们考虑通过可适应数据或可忽略数据的随机化草图方法进行压缩。我们构建了仅依赖于压缩数据的优化问题,但这些问题能够恢复出与原始矩阵高度近似的非负分解结果。所定义的问题可通过多种算法求解,特别地,我们针对这些压缩问题讨论了经典乘法更新算法的多种变体。我们通过实验验证了所提方法的有效性,并在实际应用中评估了其性能表现。