Large-scale streaming data are common in modern machine learning applications and have led to the development of online learning algorithms. Many fields, such as supply chain management, weather and meteorology, energy markets, and finance, have pivoted toward probabilistic forecasting. This results in the need not only for accurate learning of the expected value but also for learning the conditional heteroskedasticity and conditional moments. Against this backdrop, we present a methodology for online estimation of regularized, linear distributional models. The proposed algorithm combines recent developments in online estimation of LASSO models with the well-known GAMLSS framework. We provide a case study on day-ahead electricity price forecasting, in which we show the competitive performance of the incremental estimation combined with strongly reduced computational effort. Our algorithms are implemented in a computationally efficient Python package ondil.
翻译:大规模流式数据在现代机器学习应用中十分普遍,推动了在线学习算法的发展。供应链管理、气象与气候学、能源市场以及金融等众多领域,已逐渐转向概率预测。这不仅需要精准学习期望值,还需要学习条件异方差性和条件矩。在此背景下,我们提出了一种用于在线估计正则化线性分布模型的方法。该算法将LASSO模型在线估计的最新进展与广为人知的GAMLSS框架相结合。我们以日前电价预测为案例研究,展示了增量估计在显著降低计算量的同时,保持了极具竞争力的性能。我们的算法已通过计算高效的Python包ondil实现。