Time series forecasting is critical to real-world decision making, yet most existing approaches remain unimodal and rely on extrapolating historical patterns. While recent progress in large language models (LLMs) highlights the potential for multimodal forecasting, existing benchmarks largely provide retrospective or misaligned raw context, making it unclear whether such models meaningfully leverage textual inputs. In practice, human experts incorporate what-if scenarios with historical evidence, often producing distinct forecasts from the same observations under different scenarios. Inspired by this, we introduce What If TSF (WIT), a multimodal forecasting benchmark designed to evaluate whether models can condition their forecasts on contextual text, especially future scenarios. By providing expert-crafted plausible or counterfactual scenarios, WIT offers a rigorous testbed for scenario-guided multimodal forecasting. The benchmark is available at https://github.com/jinkwan1115/WhatIfTSF.
翻译:时间序列预测对于现实世界决策至关重要,然而大多数现有方法仍是单模态的,并依赖于对历史模式的推断。尽管大语言模型(LLM)的最新进展凸显了多模态预测的潜力,但现有基准大多提供回顾性的或未对齐的原始上下文,使得此类模型是否真正有效地利用了文本输入尚不明确。在实践中,人类专家会将假设场景与历史证据相结合,常常在相同观测数据下,针对不同场景产生截然不同的预测。受此启发,我们提出了 What If TSF(WIT),这是一个多模态预测基准,旨在评估模型能否将其预测基于上下文文本,特别是未来场景。通过提供专家精心设计的合理或反事实场景,WIT 为场景引导的多模态预测提供了一个严格的测试平台。该基准可在 https://github.com/jinkwan1115/WhatIfTSF 获取。