The aim of this study is to provide a foundation to understand the relationship between non-negative matrix factorization (NMF) and non-negative autoencoders enabling proper interpretation and understanding of autoencoder-based alternatives to NMF. Since its introduction, NMF has been a popular tool for extracting interpretable, low-dimensional representations of high-dimensional data. However, recently, several studies have proposed to replace NMF with autoencoders. This increasing popularity of autoencoders warrants an investigation on whether this replacement is in general valid and reasonable. Moreover, the exact relationship between non-negative autoencoders and NMF has not been thoroughly explored. Thus, a main aim of this study is to investigate in detail the relationship between non-negative autoencoders and NMF. We find that the connection between the two models can be established through convex NMF, which is a restricted case of NMF. In particular, convex NMF is a special case of an autoencoder. The performance of NMF and autoencoders is compared within the context of extraction of mutational signatures from cancer genomics data. We find that the reconstructions based on NMF are more accurate compared to autoencoders, while the signatures extracted using both methods show comparable consistencies and values when externally validated. These findings suggest that the non-negative autoencoders investigated in this article do not provide an improvement of NMF in the field of mutational signature extraction.
翻译:本研究旨在阐明非负矩阵分解(NMF)与非负自编码器之间的关系,为合理理解和解释基于自编码器的NMF替代方案提供理论基础。自提出以来,NMF已成为从高维数据中提取可解释的低维表示的流行工具。然而,近期多项研究提出用自编码器替代NMF。自编码器日益增长的普及性使得我们有必要探究这种替代在一般情况下是否合理有效。此外,非负自编码器与NMF之间的精确关系尚未得到充分探索。因此,本研究的主要目标是详细探究非负自编码器与NMF之间的关系。我们发现,这两种模型之间的关联可通过凸NMF(NMF的一种受限形式)建立。具体而言,凸NMF是自编码器的一个特例。本研究在癌症基因组学数据突变特征提取的背景下,比较了NMF与自编码器的性能。实验结果表明,基于NMF的重建精度优于自编码器,而两种方法提取的特征在外部验证中表现出相当的稳定性和数值一致性。这些发现表明,本文研究的非负自编码器在突变特征提取领域并未优于NMF。