This paper addresses the challenge of integrating sequentially arriving data within the quantile regression framework, where the number of features is allowed to grow with the number of observations, the horizon is unknown, and memory is limited. We employ stochastic sub-gradient descent to minimize the empirical check loss and study its statistical properties and regret performance. In our analysis, we unveil the delicate interplay between updating iterates based on individual observations versus batches of observations, revealing distinct regularity properties in each scenario. Our method ensures long-term optimal estimation irrespective of the chosen update strategy. Importantly, our contributions go beyond prior works by achieving exponential-type concentration inequalities and attaining optimal regret and error rates that exhibit only \textsf{ short-term} sensitivity to initial errors. A key insight from our study is the delicate statistical analyses and the revelation that appropriate stepsize schemes significantly mitigate the impact of initial errors on subsequent errors and regrets. This underscores the robustness of stochastic sub-gradient descent in handling initial uncertainties, emphasizing its efficacy in scenarios where the sequential arrival of data introduces uncertainties regarding both the horizon and the total number of observations. Additionally, when the initial error rate is well-controlled, there is a trade-off between short-term error rate and long-term optimality. Due to the lack of delicate statistical analysis for squared loss, we also briefly discuss its properties and proper schemes. Extensive simulations support our theoretical findings.
翻译:本文旨在解决分位数回归框架下整合顺序到达数据的挑战,其中特征数量允许随观测次数增长,时间跨度未知,且内存有限。我们采用随机次梯度下降法最小化经验检验损失,并研究其统计特性与遗憾表现。在分析中,我们揭示了基于单次观测更新迭代与基于批量观测更新迭代之间的微妙相互作用,发现了两种情形下不同的正则性特征。无论采用何种更新策略,我们的方法都能确保长期最优估计。重要的是,我们的贡献超越了以往研究,实现了指数型浓度不等式,并获得了最优遗憾与误差率,这些指标仅对初始误差表现出**短期**敏感性。本研究的一个关键洞察在于精细的统计分析,以及揭示出适当的步长方案能显著减轻初始误差对后续误差与遗憾的影响。这凸显了随机次梯度下降法在处理初始不确定性方面的鲁棒性,尤其强调了其在数据顺序到达引入关于时间跨度与总观测数的不确定性场景中的有效性。此外,当初始误差率得到良好控制时,短期误差率与长期最优性之间存在权衡。鉴于平方损失缺乏精细的统计分析,我们还简要讨论了其性质与适当方案。大量模拟实验支持了我们的理论发现。