Principal component analysis has been a main tool in multivariate analysis for estimating a low dimensional linear subspace that explains most of the variability in the data. However, in high-dimensional regimes, naive estimates of the principal loadings are not consistent and difficult to interpret. In the context of time series, principal component analysis of spectral density matrices can provide valuable, parsimonious information about the behavior of the underlying process, particularly if the principal components are interpretable in that they are sparse in coordinates and localized in frequency bands. In this paper, we introduce a formulation and consistent estimation procedure for interpretable principal component analysis for high-dimensional time series in the frequency domain. An efficient frequency-sequential algorithm is developed to compute sparse-localized estimates of the low-dimensional principal subspaces of the signal process. The method is motivated by and used to understand neurological mechanisms from high-density resting-state EEG in a study of first episode psychosis.
翻译:主成分分析一直是多变量分析中用于估计解释数据大部分变异性的低维线性子空间的主要工具。然而,在高维情形下,主成分载荷的朴素估计量不一致且难以解释。在时间序列背景下,对谱密度矩阵进行主成分分析能够提供关于潜在过程行为的有价值且简约的信息,尤其是当主成分在坐标上稀疏且在频带上局部化时具有可解释性。本文针对频域高维时间序列,提出了一种可解释主成分分析的公式化表述及一致估计方法。我们开发了一种高效的频率序贯算法,用于计算信号过程低维主子空间的稀疏局部化估计。该方法受一项关于首发精神病的研究中高密度静息态脑电图神经机制理解的启发,并用于该分析。