Factor models are widely applied to the analysis of multivariate data across disparate fields of research. However, modern scientific data are often incomplete, and estimating a factor model from partially observed data can be very challenging. In this work, we show that if the data are structurally incomplete, the factor model likelihood function can be decomposed into a product of likelihood functions for multiple factor models relative to different observed data subsets. If these factor models are linked together by common parameters, we can obtain complete maximum likelihood estimates of the full factor model parameters. We call this modeling framework Linked Factor Analysis (LINFA). LINFA can be used for covariance matrix completion, dependence estimation, dimension reduction, and data completion. We compute the maximum likelihood estimator through an efficient Expectation-Maximization algorithm, accelerated by a novel Group Vertex Tessellation algorithm. We establish the conditions for the consistency and asymptotic normality of the estimator. We design confidence regions, hypothesis tests, bootstrap algorithms, and methods for selecting the number of factors. Finally, we illustrate the application of LINFA in an extensive simulation study and in the analysis of neuroscience data.
翻译:因子模型广泛应用于跨学科多元数据分析。然而,现代科学数据常存在缺失,基于部分观测数据估计因子模型极具挑战。本研究证明,若数据具有结构性缺失,因子模型似然函数可分解为多个对应于不同观测数据子集的因子模型似然函数之积。若这些因子模型通过公共参数关联,则可获得完整因子模型参数的完全极大似然估计。我们称此建模框架为关联因子分析(LINFA)。LINFA可用于协方差矩阵补全、依赖性估计、降维与数据补全。我们通过高效的期望最大化算法计算极大似然估计量,并采用新型的组顶点细分算法进行加速。我们建立了估计量相合性与渐近正态性的条件,构建了置信区域、假设检验、自助法算法及因子数选择方法。最后,通过大量模拟研究与神经科学数据分析展示了LINFA的应用。