Factor models are widely used in the analysis of high-dimensional data in several fields of research. Estimating a factor model, in particular its covariance matrix, from partially observed data vectors is very challenging. In this work, we show that when the data are structurally incomplete, the factor model likelihood function can be decomposed into the product of the likelihood functions of multiple partial factor models relative to different subsets of data. If these multiple partial factor models are linked together by common parameters, then we can obtain complete maximum likelihood estimates of the factor model parameters and thereby the full covariance matrix. We call this framework Linked Factor Analysis (LINFA). LINFA can be used for covariance matrix completion, dimension reduction, data completion, and graphical dependence structure recovery. We propose an efficient Expectation-Maximization algorithm for maximum likelihood estimation, accelerated by a novel group vertex tessellation (GVT) algorithm which identifies a minimal partition of the vertex set to implement an efficient optimization in the maximization steps. We illustrate our approach in an extensive simulation study and in the analysis of calcium imaging data collected from mouse visual cortex.
翻译:因子模型在多个研究领域的高维数据分析中被广泛使用。从部分观测的数据向量中估计因子模型(尤其是其协方差矩阵)极具挑战性。本文证明,当数据在结构上不完整时,因子模型的似然函数可分解为多个部分因子模型相对于不同数据子集的似然函数的乘积。若这些部分因子模型通过公共参数相互关联,则可获得因子模型参数的完全极大似然估计,进而得到完整的协方差矩阵。我们将此框架称为链接因子分析(LINFA)。LINFA可用于协方差矩阵补全、降维、数据补全以及图依赖结构恢复。我们提出了一种高效的期望最大化算法用于极大似然估计,并通过一种新颖的群顶点镶嵌(GVT)算法加速——该算法识别顶点集的最小划分,从而在最大化步骤中实现高效优化。我们通过大规模模拟研究以及对小鼠视觉皮层钙成像数据的分析,验证了该方法的有效性。