Non-negative matrix factorization (NMF) and non-negative tensor factorization (NTF) decompose non-negative high-dimensional data into non-negative low-rank components. NMF and NTF methods are popular for their intrinsic interpretability and effectiveness on large-scale data. Recent work developed Stratified-NMF, which applies NMF to regimes where data may come from different sources (strata) with different underlying distributions, and seeks to recover both strata-dependent information and global topics shared across strata. Applying Stratified-NMF to multi-modal data requires flattening across modes, and therefore loses geometric structure contained implicitly within the tensor. To address this problem, we extend Stratified-NMF to the tensor setting by developing a multiplicative update rule and demonstrating the method on text and image data. We find that Stratified-NTF can identify interpretable topics with lower memory requirements than Stratified-NMF. We also introduce a regularized version of the method and demonstrate its effects on image data.
翻译:非负矩阵分解(NMF)和非负张量分解(NTF)将非负高维数据分解为非负低秩分量。NMF和NTF方法因其固有的可解释性及在大规模数据上的有效性而广受欢迎。近期研究提出了分层NMF(Stratified-NMF),该方法将NMF应用于数据可能来自具有不同底层分布的不同来源(即层)的场景,旨在同时恢复层依赖信息和跨层共享的全局主题。将分层NMF应用于多模态数据需要对模态进行扁平化处理,从而丢失了张量中隐含的几何结构。为解决此问题,我们将分层NMF扩展至张量场景,提出了一种乘法更新规则,并在文本和图像数据上验证了该方法。我们发现,与分层NMF相比,分层NTF能够以更低的内存需求识别出可解释的主题。我们还引入了该方法的正则化版本,并展示了其在图像数据上的效果。