Hierarchical Forecasting at Scale

Existing hierarchical forecasting techniques scale poorly when the number of time series increases. We propose to learn a coherent forecast for millions of time series with a single bottom-level forecast model by using a sparse loss function that directly optimizes the hierarchical product and/or temporal structure. The benefit of our sparse hierarchical loss function is that it provides practitioners a method of producing bottom-level forecasts that are coherent to any chosen cross-sectional or temporal hierarchy. In addition, removing the need for a post-processing step as required in traditional hierarchical forecasting techniques reduces the computational cost of the prediction phase in the forecasting pipeline. On the public M5 dataset, our sparse hierarchical loss function performs up to 10% (RMSE) better compared to the baseline loss function. We implement our sparse hierarchical loss function within an existing forecasting model at bol, a large European e-commerce platform, resulting in an improved forecasting performance of 2% at the product level. Finally, we found an increase in forecasting performance of about 5-10% when evaluating the forecasting performance across the cross-sectional hierarchies that we defined. These results demonstrate the usefulness of our sparse hierarchical loss applied to a production forecasting system at a major e-commerce platform.

翻译：现有层级预测技术在时间序列数量增加时扩展性较差。我们提出通过采用稀疏损失函数直接优化层级乘积和/或时间结构，利用单一底层预测模型对数百万时间序列学习一致性预测。该稀疏层级损失函数的优势在于：为实践者提供了一种生成底层预测的方法，使其能与任意选定的截面或时间层级保持一致。此外，它无需传统层级预测技术所需的后续处理步骤，从而降低了预测流程中预测阶段的计算成本。在公开M5数据集上，我们的稀疏层级损失函数相比基准损失函数性能提升高达10%（均方根误差）。我们在一家大型欧洲电商平台bol的现有预测模型中实现了该稀疏层级损失函数，使产品层面的预测性能提升2%。最后，在评估我们定义的截面层级预测性能时，我们发现预测性能提升约5-10%。这些结果证明了稀疏层级损失应用于大型电商平台生产预测系统的有效性。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日