Sparse Asymptotic PCA: Identifying Sparse Latent Factors Across Time Horizon

This paper proposes a novel method for sparse latent factor modeling using a new sparse asymptotic Principal Component Analysis (APCA). This approach analyzes the co-movements of large-dimensional panel data systems over time horizons within a general approximate factor model framework. Unlike existing sparse factor modeling approaches based on sparse PCA, which assume sparse loading matrices, our sparse APCA assumes that factor processes are sparse over the time horizon, while the corresponding loading matrices are not necessarily sparse. This development is motivated by the observation that the assumption of sparse loadings may not be appropriate for financial returns, where exposure to market factors is generally universal and non-sparse. We propose a truncated power method to estimate the first sparse factor process and a sequential deflation method for multi-factor cases. Additionally, we develop a data-driven approach to identify the sparsity of risk factors over the time horizon using a novel cross-sectional cross-validation method. Theoretically, we establish that our estimators are consistent under mild conditions. Monte Carlo simulations demonstrate that the proposed method performs well in finite samples. Empirically, we analyze daily stock returns for a balanced panel of S&P 500 stocks from January 2004 to December 2016. Through textual analysis, we examine specific events associated with the identified sparse factors that systematically influence the stock market. Our approach offers a new pathway for economists to study and understand the systematic risks of economic and financial systems over time.

翻译：本文提出了一种利用新型稀疏渐近主成分分析（APCA）进行稀疏潜在因子建模的新方法。该方法在广义近似因子模型框架内，分析大维度面板数据系统在时间维度上的协同变动。与现有基于稀疏PCA的稀疏因子建模方法（假设载荷矩阵具有稀疏性）不同，我们的稀疏APCA假设因子过程在时间维度上具有稀疏性，而相应的载荷矩阵不一定稀疏。这一发展的动机源于以下观察：对于金融收益率数据，载荷稀疏性假设可能并不适用，因为市场因子的暴露通常具有普遍性和非稀疏性。我们提出了一种截断幂法来估计首个稀疏因子过程，并针对多因子情形开发了序贯收缩方法。此外，我们通过创新的横截面交叉验证方法，建立了数据驱动的风险因子时间维度稀疏性识别框架。在理论上，我们证明了所提估计量在温和条件下具有一致性。蒙特卡洛模拟表明该方法在有限样本中表现良好。在实证分析中，我们研究了2004年1月至2016年12月期间标准普尔500指数成分股的日收益率平衡面板数据。通过文本分析，我们检验了与所识别稀疏因子相关的特定事件，这些事件对股票市场产生了系统性影响。本方法为经济学家研究经济金融系统随时间演变的系统性风险提供了新的分析路径。