Traditional static functional data analysis is facing new challenges due to streaming data, where data constantly flow in. A major challenge is that storing such an ever-increasing amount of data in memory is nearly impossible. In addition, existing inferential tools in online learning are mainly developed for finite-dimensional problems, while inference methods for functional data are focused on the batch learning setting. In this paper, we tackle these issues by developing functional stochastic gradient descent algorithms and proposing an online bootstrap resampling procedure to systematically study the inference problem for functional linear regression. In particular, the proposed estimation and inference procedures use only one pass over the data; thus they are easy to implement and suitable to the situation where data arrive in a streaming manner. Furthermore, we establish the convergence rate as well as the asymptotic distribution of the proposed estimator. Meanwhile, the proposed perturbed estimator from the bootstrap procedure is shown to enjoy the same theoretical properties, which provide the theoretical justification for our online inference tool. As far as we know, this is the first inference result on the functional linear regression model with streaming data. Simulation studies are conducted to investigate the finite-sample performance of the proposed procedure. An application is illustrated with the Beijing multi-site air-quality data.
翻译:传统静态函数型数据分析正面临来自流式数据的新挑战,此类数据持续流入。主要难点在于:将日益增长的数据全部存入内存几乎不可能。此外,现有在线学习中的推断工具主要针对有限维问题开发,而函数型数据的推断方法则集中于批量学习模式。本文针对这些问题,开发了函数型随机梯度下降算法,并提出在线自助重抽样流程,系统研究函数型线性回归的推断问题。具体而言,所提出的估计与推断流程仅需对数据单次遍历;因此易于实现,适用于数据以流式方式到达的场景。此外,我们建立了所提估计量的收敛速率及渐近分布,同时证明自助法中的扰动估计量具有相同的理论性质,为在线推断工具提供理论支撑。据我们所知,这是关于流式数据下函数型线性回归模型的首次推断结果。通过仿真研究验证了所提方法的有限样本性能,并利用北京多站点空气质量数据进行了应用分析。