The ability to assess ex-ante whether a time series is likely to be accurately forecast is important for forecasting practice because it informs the degree of modelling effort warranted. We define forecastability as a property of a time series (given a declared information set), and measure horizon-specific forecastability as the reduction in uncertainty provided by the past, using auto-mutual information (AMI) at lag h. AMI is estimated from training data using a k-nearest-neighbour estimator and evaluated against out-of-sample forecast error (sMAPE) on a filtered, balanced sample of 1,350 M4 series across six sampling frequencies. Seasonal Naive, ETS, and N-BEATS are used as probes of out-of-sample forecast performance. Training-only AMI provides a frequency-conditional diagnostic for forecast difficulty: for Hourly, Weekly, Quarterly, and Yearly series, AMI exhibits consistently negative rank correlation with sMAPE across probes. Under N-BEATS, the correlation is strongest for Hourly (p= -0.52) and Weekly (p= -0.51), with Quarterly (p= -0.42) and Yearly (p = -0.36) also substantial. Monthly is probe-dependent (Seasonal Naive p= -0.12; ETS p = -0.26; N-BEATS p = -0.24). Daily shows notably weaker AMI-sMAPE correlation under this protocol, suggesting limited ability to discriminate between series despite the presence of temporal dependence. The findings support within-frequency triage and effort allocation based on measurable signal content prior to forecasting, rather than between-frequency comparisons of difficulty.
翻译:评估时间序列是否可能被准确预测的先验能力对于预测实践至关重要,因为它决定了所需建模工作的合理程度。我们将可预测性定义为时间序列(给定声明信息集)的一种属性,并使用滞后h处的自互信息(AMI)来衡量特定时域的可预测性,即过去信息所提供的不确定性减少量。AMI通过k近邻估计器从训练数据中估计,并在经过筛选、平衡的1,350个M4序列样本(涵盖六种采样频率)上,与样本外预测误差(sMAPE)进行对比评估。使用季节性朴素法、ETS和N-BEATS作为样本外预测性能的探测方法。仅基于训练数据的AMI提供了频率条件性的预测难度诊断:对于小时、周、季度和年度序列,AMI与sMAPE在所有探测方法中均呈现一致的负秩相关。在N-BEATS下,小时序列(ρ = -0.52)和周序列(ρ = -0.51)的相关性最强,季度序列(ρ = -0.42)和年度序列(ρ = -0.36)的相关性也相当显著。月度序列的相关性取决于探测方法(季节性朴素法ρ = -0.12;ETS ρ = -0.26;N-BEATS ρ = -0.24)。在此协议下,日序列的AMI-sMAPE相关性明显较弱,表明尽管存在时间依赖性,但其区分不同序列的能力有限。研究结果支持在预测前基于可测量的信号内容进行频率内的分类与资源分配,而非进行频率间的难度比较。