In many systems, the true data-generating process is unknown, requiring forecasters to rely on observed time series. This study proposes a pre-modeling diagnostic framework for horizon-specific forecastability assessment that evaluates forecastability before model selection begins. Forecastability is operationalized using auto-mutual information at lag h, which quantifies how much past observations reduce uncertainty about future values, estimated via a k-nearest-neighbor estimator computed strictly on training data to preserve out-of-sample validity. The diagnostic signal is validated against realized out-of-sample symmetric mean absolute percentage error across 42,355 time series spanning six temporal frequencies, using benchmark and higher-capacity probe models under a rolling-origin protocol. The results reveal a strong frequency-dependent relationship between measurable dependence and realized forecast error: for five of six frequencies, auto-mutual information exhibits a consistent negative rank association with realized error, supporting its use as a forecast triage signal for modeling investment decisions, whereas daily series show weaker discrimination despite measurable dependence. Across all frequencies, median forecast error declines monotonically from low to high forecastability terciles, demonstrating clear decision-relevant separation. Overall, the findings establish measurable past-future dependence as a practical screening tool for analytics-driven forecasting strategy, identifying when advanced models are likely to add value, when simple baselines suffice, and when attention should shift from accuracy improvement to robust decision design, thereby supporting a diagnostic-first approach to modeling effort and resource allocation in organizational forecasting contexts.
翻译:在许多系统中,真实的数据生成过程是未知的,迫使预测者依赖观测到的时间序列。本研究提出了一种面向预测视界的预建模诊断框架,用于在模型选择开始前评估特定视界的可预测性。可预测性通过滞后h的自互信息进行操作化定义,该指标量化了过去观测值对未来值不确定性的减少程度,并通过严格在训练数据上计算的k-近邻估计量进行评估,以保持样本外有效性。该诊断信号基于42,355个时间序列(涵盖六种时间频率)的实际样本外对称平均绝对百分比误差进行验证,采用了滚动原点协议下的基准模型与高容量探测模型。结果揭示了可测量依赖性与实际预测误差之间存在强频率依赖性关系:在六种频率中,五种的自互信息与实际误差呈现一致的负秩相关,支持其作为预测分流信号用于建模投资决策,而日度序列尽管存在可测量的依赖性,但区分能力较弱。在所有频率中,中位预测误差从低可预测性三分位数到高可预测性三分位数单调递减,展现了清晰的决策相关分离。总体而言,研究结果确立了可测量的过去-未来依赖性作为分析驱动型预测策略的实用筛选工具,能够识别何时先进模型可能增加价值、何时简单基线即可满足需求,以及何时应将关注点从精度提升转向稳健决策设计,从而在组织预测背景下支持以诊断为先导的建模投入与资源配置方法。