Functional data analysis (FDA) almost always involves smoothing discrete observations into curves, because they are never observed in continuous time and rarely without error. Although smoothing parameters affect the subsequent inference, data-driven methods for selecting these parameters are not well-developed, frustrated by the difficulty of using all the information shared by curves while being computationally efficient. On the one hand, smoothing individual curves in an isolated, albeit sophisticated way, ignores useful signals present in other curves. On the other hand, bandwidth selection by automatic procedures such as cross-validation after pooling all the curves together quickly become computationally unfeasible due to the large number of data points. In this paper we propose a new data-driven, adaptive kernel smoothing, specifically tailored for functional principal components analysis (FPCA) through the derivation of sharp, explicit risk bounds for the eigen-elements. The minimization of these quadratic risk bounds provide refined, yet computationally efficient bandwidth rules for each eigen-element separately. Both common and independent design cases are allowed. Rates of convergence for the adaptive eigen-elements estimators are derived. An extensive simulation study, designed in a versatile manner to closely mimic characteristics of real data sets, support our methodological contribution, which is available for use in the R package FDAdapt.
翻译:函数型数据分析几乎总是涉及将离散观测平滑为曲线,因为数据从未在连续时间中被观测到,且很少没有误差。尽管平滑参数会影响后续推断,但基于数据驱动的参数选择方法尚未充分发展,这受限于如何既能利用曲线间的共享信息,又能保持计算效率。一方面,以孤立(尽管复杂)的方式对单个曲线进行平滑,会忽略其他曲线中包含的有用信号。另一方面,通过将所有曲线合并后进行交叉验证等自动程序选择带宽,会因数据点数量庞大而迅速变得计算上不可行。本文提出了一种新的数据驱动自适应核平滑方法,通过推导特征元素的精确显式风险界,专门针对函数型主成分分析设计。对这些二次风险界的最小化为每个特征元素分别提供了精细且计算高效的带宽规则。该方法同时适用于共同设计和独立设计两种情形。文中推导了自适应特征元素估计量的收敛速度。一项设计灵活、旨在紧密模拟真实数据集特征的广泛模拟研究支持了我们的方法贡献,该方法已可在R包FDAdapt中使用。