Hierarchical Forecasting at Scale

Existing hierarchical forecasting techniques scale poorly when the number of time series increases. We propose to learn a coherent forecast for millions of time series with a single bottom-level forecast model by using a sparse loss function that directly optimizes the hierarchical product and/or temporal structure. The benefit of our sparse hierarchical loss function is that it provides practitioners a method of producing bottom-level forecasts that are coherent to any chosen cross-sectional or temporal hierarchy. In addition, removing the need for a post-processing step as required in traditional hierarchical forecasting techniques reduces the computational cost of the prediction phase in the forecasting pipeline. On the public M5 dataset, our sparse hierarchical loss function performs up to 10% (RMSE) better compared to the baseline loss function. We implement our sparse hierarchical loss function within an existing forecasting model at bol, a large European e-commerce platform, resulting in an improved forecasting performance of 2% at the product level. Finally, we found an increase in forecasting performance of about 5-10% when evaluating the forecasting performance across the cross-sectional hierarchies that we defined. These results demonstrate the usefulness of our sparse hierarchical loss applied to a production forecasting system at a major e-commerce platform.

翻译：现有层次化预测技术在时间序列数量增加时扩展性较差。我们提出通过使用直接优化层次化乘积和/或时间结构的稀疏损失函数，利用单一底层预测模型对数百万时间序列学习一致性预测。该稀疏层次化损失函数的优势在于：为实践者提供了一种生成对任意选择的横截面或时间层次结构具有一致性的底层预测方法。此外，省去传统层次化预测技术中所需的后处理步骤，降低了预测流程中预测阶段的计算成本。在公开M5数据集上，与基准损失函数相比，我们的稀疏层次化损失函数性能提升高达10%（RMSE）。我们在欧洲大型电商平台bol的现有预测模型中实现该稀疏层次化损失函数，使产品级预测性能提升2%。最后，当评估所定义的横截面层次上的预测性能时，我们发现预测性能提升约5-10%。这些结果证明了稀疏层次化损失函数在大型电商平台生产级预测系统中的实用性。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日