Theory of functional principal component analysis for discretely observed data

Functional data analysis is an important research field in statistics which treats data as random functions drawn from some infinite-dimensional functional space, and functional principal component analysis (FPCA) based on eigen-decomposition plays a central role for data reduction and representation. After nearly three decades of research, there remains a key problem unsolved, namely, the perturbation analysis of covariance operator for diverging number of eigencomponents obtained from noisy and discretely observed data. This is fundamental for studying models and methods based on FPCA, while there has not been substantial progress since Hall, M\"uller and Wang (2006)'s result for a fixed number of eigenfunction estimates. In this work, we aim to establish a unified theory for this problem, obtaining upper bounds for eigenfunctions with diverging indices in both the $\mathcal{L}^2$ and supremum norms, and deriving the asymptotic distributions of eigenvalues for a wide range of sampling schemes. Our results provide insight into the phenomenon when the $\mathcal{L}^{2}$ bound of eigenfunction estimates with diverging indices is minimax optimal as if the curves are fully observed, and reveal the transition of convergence rates from nonparametric to parametric regimes in connection to sparse or dense sampling. We also develop a double truncation technique to handle the uniform convergence of estimated covariance and eigenfunctions. The technical arguments in this work are useful for handling the perturbation series with noisy and discretely observed functional data and can be applied in models or those involving inverse problems based on FPCA as regularization, such as functional linear regression.

翻译：函数数据分析是统计学中的一个重要研究领域，它将数据视为来自某个无限维函数空间的随机函数，而基于特征分解的函数主成分分析（FPCA）在数据降维与表示中发挥着核心作用。经过近三十年的研究，仍有一个关键问题尚未解决，即：针对含噪声且离散观测数据，当特征成分数量发散时，协方差算子的扰动分析。这一问题对基于FPCA的模型与方法的理论研究至关重要，但自Hall、Müller和Wang（2006）关于固定数量特征函数估计的结果以来，该领域尚未取得实质性进展。本研究旨在为该问题建立统一的理论框架，分别获得具有发散指标的特征函数在$\mathcal{L}^2$范数和上确界范数下的上界，并推导出一系列采样方案下特征值的渐近分布。我们的结果揭示了当发散指标的特征函数估计的$\mathcal{L}^{2}$界达到曲线完全观测时的极小极大最优性现象，并展现了从非参数到参数收敛速率在稀疏或密集采样下的转变。我们还发展了一种双重截断技术来处理估计协方差与特征函数的一致收敛性。本文的技术论证可用于处理含噪声且离散观测函数数据的扰动级数，并可应用于基于FPCA作为正则化的模型或涉及反问题（如函数线性回归）的模型中。