Stochastic approximation (SA) is a powerful and scalable computational method for iteratively estimating the solution of optimization problems in the presence of randomness, particularly well-suited for large-scale and streaming data settings. In this work, we propose a theoretical framework for stochastic approximation (SA) applied to non-parametric least squares in reproducing kernel Hilbert spaces (RKHS), enabling online statistical inference in non-parametric regression models. We achieve this by constructing asymptotically valid pointwise (and simultaneous) confidence intervals (bands) for local (and global) inference of the nonlinear regression function, via employing an online multiplier bootstrap approach to functional stochastic gradient descent (SGD) algorithm in the RKHS. Our main theoretical contributions consist of a unified framework for characterizing the non-asymptotic behavior of the functional SGD estimator and demonstrating the consistency of the multiplier bootstrap method. The proof techniques involve the development of a higher-order expansion of the functional SGD estimator under the supremum norm metric and the Gaussian approximation of suprema of weighted and non-identically distributed empirical processes. Our theory specifically reveals an interesting relationship between the tuning of step sizes in SGD for estimation and the accuracy of uncertainty quantification.
翻译:随机逼近(SA)是一种强大且可扩展的计算方法,用于在随机环境下迭代估计优化问题的解,特别适用于大规模和流式数据处理场景。本文提出了一个应用于再生核希尔伯特空间(RKHS)中非参数最小二乘的随机逼近理论框架,实现了非参数回归模型的在线统计推断。通过在RKHS中对函数随机梯度下降(SGD)算法采用在线乘子自助法,我们构建了用于非线性回归函数局部(及全局)推断的渐近有效逐点(及同时)置信区间(带)。本文的主要理论贡献包括:建立了刻画函数型SGD估计量非渐近行为的统一框架,并证明了乘子自助法的一致性。证明技术涉及在一致范数度量下推导函数型SGD估计量的高阶展开,以及加权非独立同分布经验过程上确界的高斯逼近。我们的理论特别揭示了SGD中步长调节对估计的影响与不确定性量化精度之间的有趣关系。