Training time-series forecast models requires aligning the conditional distribution of model forecasts with that of the label sequence. The standard direct forecast (DF) approach resorts to minimize the conditional negative log-likelihood of the label sequence, typically estimated using the mean squared error. However, this estimation proves to be biased in the presence of label autocorrelation. In this paper, we propose DistDF, which achieves alignment by alternatively minimizing a discrepancy between the conditional forecast and label distributions. Because conditional discrepancies are difficult to estimate from finite time-series observations, we introduce a newly proposed joint-distribution Wasserstein discrepancy for time-series forecasting, which provably upper bounds the conditional discrepancy of interest. This discrepancy admits tractable, differentiable estimation from empirical samples and integrates seamlessly with gradient-based training. Extensive experiments show that DistDF improves the performance diverse forecast models and achieves the state-of-the-art forecasting performance. Code is available at https://anonymous.4open.science/r/DistDF-F66B.
翻译:训练时间序列预测模型需要将模型预测的条件分布与标签序列的条件分布对齐。标准的直接预测(DF)方法通常通过最小化标签序列的条件负对数似然(通常使用均方误差估计)来实现。然而,在存在标签自相关的情况下,这种估计被证明是有偏的。本文提出DistDF,该方法通过交替最小化条件预测分布与标签分布之间的差异来实现对齐。由于条件差异难以从有限的时间序列观测中估计,我们引入了一种新提出的联合分布Wasserstein差异用于时间序列预测,该差异可证明地界定了目标条件差异的上界。该差异允许从经验样本中进行可处理、可微分的估计,并能与基于梯度的训练无缝集成。大量实验表明,DistDF提升了多种预测模型的性能,并实现了最先进的预测性能。代码可在https://anonymous.4open.science/r/DistDF-F66B获取。