Functional data analysis almost always involves smoothing discrete observations into curves, because they are never observed in continuous time and rarely without error. Although smoothing parameters affect the subsequent inference, data-driven methods for selecting these parameters are not well-developed, frustrated by the difficulty of using all the information shared by curves while being computationally efficient. On the one hand, smoothing individual curves in an isolated, albeit sophisticated way, ignores useful signals present in other curves. On the other hand, bandwidth selection by automatic procedures such as cross-validation after pooling all the curves together quickly become computationally unfeasible due to the large number of data points. In this paper we propose a new data-driven, adaptive kernel smoothing, specifically tailored for functional principal components analysis through the derivation of sharp, explicit risk bounds for the eigen-elements. The minimization of these quadratic risk bounds provide refined, yet computationally efficient bandwidth rules for each eigen-element separately. Both common and independent design cases are allowed. Rates of convergence for the estimators are derived. An extensive simulation study, designed in a versatile manner to closely mimic the characteristics of real data sets supports our methodological contribution. An illustration on a real data application is provided.
翻译:函数型数据分析几乎总是涉及将离散观测值平滑为曲线,因为曲线既无法在连续时间中观测到,也极少不伴有误差。尽管平滑参数会影响后续推断,但基于数据驱动的参数选择方法尚不成熟,其难点在于如何高效利用所有曲线共享的信息。一方面,以孤立方式(即使方法复杂)对单条曲线进行平滑会忽略其他曲线蕴含的有用信号;另一方面,通过交叉验证等自动过程将全部曲线合并后进行带宽选择,会因数据点数量庞大而迅速变得计算不可行。本文提出一种新型数据驱动自适应核平滑方法,专门针对函数型主成分分析,通过推导特征元素尖锐显式的风险界来实现。最小化这些二次风险界可为每个特征元素分别提供精确且计算高效的带宽规则。该方法同时允许共同设计情形与独立设计情形。推导了估计量的收敛速度。通过灵活设计以密切模拟真实数据集特征的大规模仿真研究,支持了本方法论的贡献。最后通过真实数据应用实例进行了说明。