Dimension reduction is often necessary in functional data analysis, with functional principal component analysis being one of the most widely used techniques. A key challenge in applying these methods is determining the number of eigen-pairs to retain, a problem known as order determination. When a covariance function admits a finite representation, the challenge becomes estimating the rank of the associated covariance operator. While this problem is straightforward when the full trajectories of functional data are available, in practice, functional data are typically collected discretely and are subject to measurement error contamination. This contamination introduces a ridge to the empirical covariance function, which obscures the true rank of the covariance operator. We propose a novel procedure to identify the true rank of the covariance operator by leveraging the information of eigenvalues and eigenfunctions. By incorporating the nonparametric nature of functional data through smoothing techniques, the method is applicable to functional data collected at random, subject-specific points. Extensive simulation studies demonstrate the excellent performance of our approach across a wide range of settings, outperforming commonly used information-criterion-based methods and maintaining effectiveness even in high-noise scenarios. We further illustrate our method with two real-world data examples.
翻译:在函数数据分析中,降维通常是必要的,其中函数主成分分析是最广泛使用的技术之一。应用这些方法的一个关键挑战是确定需要保留的特征对数量,即阶数确定问题。当协方差函数允许有限表示时,该挑战转化为估计相关协方差算子的秩。虽然当函数数据的完整轨迹可用时该问题较为直接,但在实践中,函数数据通常以离散方式收集且受到测量误差污染。这种污染会在经验协方差函数中引入一个岭,从而掩盖协方差算子的真实秩。我们提出了一种新方法,通过利用特征值和特征函数的信息来识别协方差算子的真实秩。该方法通过平滑技术结合函数数据的非参数特性,适用于在随机、个体特定点收集的函数数据。大量的模拟研究表明,我们的方法在多种设置下均表现出优异的性能,优于常用的基于信息准则的方法,并在高噪声场景下仍保持有效性。我们进一步通过两个真实世界数据示例说明了该方法的应用。