This paper studies high-dimensional canonical correlation analysis (CCA) with an emphasis on the vectors that define canonical variables. The paper shows that when two dimensions of data grow to infinity jointly and proportionally, the classical CCA procedure for estimating those vectors fails to deliver a consistent estimate. This provides the first result on the impossibility of identification of canonical variables in the CCA procedure when all dimensions are large. As a countermeasure, the paper derives the magnitude of the estimation error, which can be used in practice to assess the precision of CCA estimates. Applications of the results to cyclical vs. non-cyclical stocks and to a limestone grassland data set are provided.
翻译:本文研究高维典型相关分析(CCA),重点关注定义典型变量的向量。论文证明,当数据的两个维度按比例共同趋于无穷大时,用于估计这些向量的经典CCA方法无法给出一致估计量。这首次揭示了在所有维度都较大的情况下,CCA方法中典型变量不可识别性的理论结果。作为应对策略,本文推导了估计误差的量化范围,该结果可用于实践中评估CCA估计的精确度。研究结论在周期性与非周期性股票分析以及石灰岩草原数据集上得到了应用验证。