In this paper, we propose a distributed framework for reducing the dimensionality of high-dimensional, large-scale, heterogeneous matrix-variate time series data using a factor model. The data are first partitioned column-wise (or row-wise) and allocated to node servers, where each node estimates the row (or column) loading matrix via two-dimensional tensor PCA. These local estimates are then transmitted to a central server and aggregated, followed by a final PCA step to obtain the global row (or column) loading matrix estimator. Given the estimated loading matrices, the corresponding factor matrices are subsequently computed. Unlike existing distributed approaches, our framework preserves the latent matrix structure, thereby improving computational efficiency and enhancing information utilization. We also discuss row- and column-wise clustering procedures for settings in which the group memberships are unknown. Furthermore, we extend the analysis to unit-root nonstationary matrix-variate time series. Asymptotic properties of the proposed method are derived for the diverging dimension of the data in each computing unit and the sample size $T$. Simulation results assess the computational efficiency and estimation accuracy of the proposed framework, and real data applications further validate its predictive performance.
翻译:本文提出了一种分布式框架,用于通过因子模型对高维、大规模、异构的矩阵时序数据进行降维处理。数据首先按列方向(或行方向)进行划分并分配至节点服务器,各节点通过二维张量主成分分析估计行(或列)载荷矩阵。这些局部估计值随后传输至中央服务器进行聚合,再经过最终的主成分分析步骤得到全局的行(或列)载荷矩阵估计量。在获得估计的载荷矩阵后,可进一步计算相应的因子矩阵。与现有分布式方法不同,本框架保留了潜在的矩阵结构,从而提升了计算效率并增强了信息利用率。我们还讨论了在组别归属未知情况下的行向与列向聚类流程。此外,我们将分析扩展至单位根非平稳矩阵时序数据。针对每个计算单元中数据维度与样本量$T$发散的情形,推导了所提方法的渐近性质。仿真实验评估了该框架的计算效率与估计精度,实际数据应用进一步验证了其预测性能。