Streaming data often exhibit heterogeneity due to heteroscedastic variances or inhomogeneous covariate effects. Online renewable quantile and expectile regression methods provide valuable tools for detecting such heteroscedasticity by combining current data with summary statistics from historical data. However, quantile regression can be computationally demanding because of the non-smooth check function. To address this, we propose a novel online renewable method based on expectile regression, which efficiently updates estimates using both current observations and historical summaries, thereby reducing storage requirements. By exploiting the smoothness of the expectile loss function, our approach achieves superior computational efficiency compared with existing online renewable methods for streaming data with heteroscedastic variances or inhomogeneous covariate effects. We establish the consistency and asymptotic normality of the proposed estimator under mild regularity conditions, demonstrating that it achieves the same statistical efficiency as oracle estimators based on full individual-level data. Numerical experiments and real-data applications demonstrate that our method performs comparably to the oracle estimator while maintaining high computational efficiency and minimal storage costs.
翻译:流式数据常因异方差性或协变量效应的非均匀性而呈现异质性。在线可再生分位数与期望回归方法通过结合当前数据与历史数据的汇总统计量,为检测此类异质性提供了有力工具。然而,分位数回归因非光滑的检验函数而在计算上较为耗时。为此,我们提出一种基于期望回归的新型在线可再生方法,该方法能利用当前观测值与历史汇总信息高效更新估计量,从而降低存储需求。通过利用期望损失函数的光滑性,本方法在处理具有异方差性或非均匀协变量效应的流式数据时,相比现有在线可再生方法实现了更优的计算效率。我们在温和的正则条件下证明了所提估计量的一致性与渐近正态性,表明其能达到与基于完整个体层面数据的理想估计量相同的统计效率。数值实验与真实数据应用表明,本方法在保持高计算效率与最低存储成本的同时,其性能与理想估计量相当。