In myriad statistical applications, data are collected from related but heterogeneous sources. These sources share some commonalities while containing idiosyncratic characteristics. More specifically, consider the setting where observation matrices from $N$ sources $\{M_{i}\}_{i=1}^N$ are generated from a few common and source-specific factors. Is it possible to recover the shared and source-specific factors? We show that under appropriate conditions on the alignment of source-specific factors, the problem is well-defined and both shared and source-specific factors are identifiable under a constrained matrix factorization objective. To solve this objective, we propose a new class of matrix factorization algorithms, called Heterogeneous Matrix Factorization. HMF is easy to implement, enjoys local linear convergence under suitable assumptions, and is intrinsically distributed. Through a variety of empirical studies, we showcase the advantageous properties of HMF and its potential application in feature extraction and change detection.
翻译:在众多统计应用中,数据收集自相关但异质的来源。这些来源既有共性,也包含各自特有的特征。具体而言,考虑来自$N$个来源的观测矩阵$\{M_{i}\}_{i=1}^N$,它们由少量公共因子和来源特有因子生成。是否可能恢复这些共享因子和来源特有因子?我们证明,在来源特有因子对齐的适当条件下,该问题具有良好的定义,且共享因子和来源特有因子在约束矩阵分解目标下是可识别的。为解决该目标,我们提出一类新的矩阵分解算法,称为异质矩阵分解(Heterogeneous Matrix Factorization, HMF)。HMF易于实现,在适当假设下具有局部线性收敛性,且本质上是分布式的。通过一系列实证研究,我们展示了HMF的优越特性及其在特征提取和变化检测中的潜在应用。