Theory of functional principal component analysis for noisy and discretely observed data

Functional data analysis is an important research field in statistics which treats data as random functions drawn from some infinite-dimensional functional space, and functional principal component analysis (FPCA) based on eigen-decomposition plays a central role for data reduction and representation. After nearly three decades of research, there remains a key problem unsolved, namely, the perturbation analysis of covariance operator for diverging number of eigencomponents obtained from noisy and discretely observed data. This is fundamental for studying models and methods based on FPCA, while there has not been substantial progress since Hall, M\"uller and Wang (2006)'s result for a fixed number of eigenfunction estimates. In this work, we aim to establish a unified theory for this problem, deriving the moment bounds of eigenfunctions and asymptotic distributions of eigenvalues for a wide range of sampling schemes. Our results provide insight into the phenomenon when the $\mathcal{L}^{2}$ bound of eigenfunction estimates with diverging indices is minimax optimal as if the curves are fully observed, and reveal the transition of convergence rates from nonparametric to parametric regimes in connection to sparse or dense sampling. We also propose a double truncation technique to derive the uniform convergence (in time domain) of estimated eigenfunctions for the first time. The technical arguments in this work are useful for handling the perturbation series with noisy and discretely observed data and can be applied in models or those involving inverse problems based on FPCA as regularization, such as functional linear regression.

翻译：函数型数据分析是统计学中的重要研究领域，它将数据视为从某个无穷维函数空间中抽取的随机函数，而基于特征分解的函数主成分分析在数据降维与表示中发挥着核心作用。经过近三十年的研究，仍存在一个关键问题尚未解决：当从含噪声且离散观测数据中获取发散数目的特征分量时，协方差算子的摄动分析问题。这一问题对基于函数主成分分析的模型与方法研究至关重要，但自Hall、Müller与Wang（2006）针对固定数目特征函数估计的结果发表以来，该领域未取得实质性进展。本研究致力于为这一问题建立统一理论，推导出多种采样方案下特征函数的矩界与特征值的渐近分布。我们的研究结果揭示了当发散索引特征函数估计的$\mathcal{L}^{2}$界达到类似于曲线完全观测时的极小极大最优性这一现象，并揭示了从非参数到参数框架收敛速度的转变及其与稀疏或密集采样的关联。我们还首次提出了一种双截断技术，用以推导特征函数估计在时域上的一致收敛性。本研究中的技术论证方法可用于处理含噪声与离散观测数据的摄动级数问题，并能应用于基于函数主成分分析作为正则化项的模型或反问题（如函数线性回归）。