In the setting of functional data analysis, we derive optimal rates of convergence in the supremum norm for estimating the H\"older-smooth mean function of a stochastic processes which is repeatedly and discretely observed at fixed, multivariate, synchronous design points and with additional errors. Similarly to the rates in $L_2$ obtained in Cai and Yuan (2011), for sparse design a discretization term dominates, while in the dense case the $\sqrt n$ rate can be achieved as if the $n$ processes were continuously observed without errors. However, our analysis differs in several respects from Cai and Yuan (2011). First, we do not assume that the paths of the processes are as smooth as the mean, but still obtain the $\sqrt n$ rate of convergence without additional logarithmic factors in the dense setting. Second, we show that in the supremum norm, there is an intermediate regime between the sparse and dense cases dominated by the contribution of the observation errors. Third, and in contrast to the analysis in $L_2$, interpolation estimators turn out to be sub-optimal in $L_\infty$ in the dense setting, which explains their poor empirical performance. We also obtain a central limit theorem in the supremum norm and discuss the selection of the bandwidth. Simulations and real data applications illustrate the results.
翻译:在函数型数据分析的框架下,我们推导了在极差范数下估计Hölder光滑均值函数的最优收敛速率,该均值函数对应于一个被重复且离散观测的随机过程,观测点固定、多元、同步且包含附加误差。与Cai和Yuan (2011)在$L_2$中得到的速率类似,稀疏设计中离散项主导收敛速率,而密集情况下可实现$\sqrt n$速率,如同$n$个过程被连续无误观测。然而,我们的分析在多个方面与Cai和Yuan (2011)不同。首先,我们不假设过程路径与均值函数具有相同光滑度,但仍可在密集设定下达到$\sqrt n$收敛速率且无需额外对数因子。其次,我们证明在极差范数下,存在稀疏与密集情形之间的中间状态,该状态由观测误差贡献主导。第三,与$L_2$分析形成对比,插值估计量在$L_\infty$密集设定下表现为次优,这解释了其较差的实证表现。我们还获得了极差范数下的中心极限定理,并讨论了带宽选择。模拟实验和实际数据应用验证了上述结果。