We propose a novel and robust online function-on-scalar regression technique via geometric median to learn associations between functional responses and scalar covariates based on massive or streaming datasets. The online estimation procedure, developed using the average stochastic gradient descent algorithm, offers an efficient and cost-effective method for analyzing sequentially augmented datasets, eliminating the need to store large volumes of data in memory. We establish the almost sure consistency, $L_p$ convergence, and asymptotic normality of the online estimator. To enable efficient and fast inference of the parameters of interest, including the derivation of confidence intervals, we also develop an innovative two-step online bootstrap procedure to approximate the limiting error distribution of the robust online estimator. Numerical studies under a variety of scenarios demonstrate the effectiveness and efficiency of the proposed online learning method. A real application analyzing PM$_{2.5}$ air-quality data is also included to exemplify the proposed online approach.
翻译:本文提出一种新颖且稳健的在线函数对标量回归方法,该方法通过几何中位数,基于海量或流式数据学习函数型响应变量与标量协变量之间的关联。利用平均随机梯度下降算法开发的在线估计程序,为分析顺序增广数据集提供了一种高效且经济的方法,无需在内存中存储大量数据。我们建立了在线估计量的几乎必然一致性、$L_p$ 收敛性及渐近正态性。为了实现对感兴趣参数(包括置信区间的推导)的高效快速推断,我们还开发了一种创新的两步在线自助法程序,以逼近稳健在线估计量的极限误差分布。多种场景下的数值研究证明了所提在线学习方法的有效性和效率。文中还包含一个分析PM$_{2.5}$空气质量数据的实际应用,以例证所提出的在线方法。