FinSTaR: Towards Financial Reasoning with Time Series Reasoning Models

Time series (TS) reasoning models (TSRMs) have shown promising capabilities in general domains, yet they consistently fail in the financial domain, which exhibits unique characteristics. We propose a general 2 x 2 capability taxonomy for TSRMs by crossing 1) single-entity vs. multi-entity analysis with 2) assessment of the current state vs. prediction of future behavior. We instantiate this taxonomy in the financial domain-where the distinction between deterministic assessment and stochastic prediction is particularly critical-as ten financial reasoning tasks, forming the FinTSR-Bench benchmark based on S&P stocks. To this end, we propose FinSTaR (Financial Time Series Thinking and Reasoning), trained on FinTSR-Bench with distinct chain-of-thought (CoT) strategies tailored to each category. For assessment, which is deterministic (i.e., computable from observable data), we employ Compute-in-CoT, a programmatic CoT that enables models to derive answers directly from raw prices. For prediction, which is inherently stochastic (i.e., subject to unobservable factors), we adopt Scenario-Aware CoT, which generates diverse scenarios before making a judgment, mirroring how financial analysts reason under uncertainty. The proposed method achieves 78.9% average accuracy on FinTSR-Bench, substantially outperforming LLM and TSRM baselines. Furthermore, we show that the four capability categories are complementary and mutually reinforcing through joint training, and that Scenario-Aware CoT consistently improves prediction accuracy over standard CoT. Code is available at https://github.com/seunghan96/FinSTaR.

翻译：时间序列（TS）推理模型（TSRMs）在通用领域中展现出良好的能力，但在具有独特特征的金融领域却屡屡失效。我们提出了一个通用的2×2能力分类法，用于TSRMs，该分类法交叉了1）单一实体与多实体分析，以及2）当前状态评估与未来行为预测。我们将此分类法实例化于金融领域——其中确定性评估与随机性预测之间的区别尤为关键——形成十个金融推理任务，并基于标普500股票构建了FinTSR-Bench基准。为此，我们提出了FinSTaR（金融时间序列思考与推理），该模型在FinTSR-Bench上训练，并结合了针对每个类别量身定制的不同思维链（CoT）策略。对于评估任务，其为确定性的（即可从可观测数据计算得出），我们采用计算式思维链（Compute-in-CoT），一种程序化的思维链，使模型能够直接从原始价格推导出答案。对于预测任务，其本质上是随机性的（即受不可观测因素影响），我们采用情景感知思维链（Scenario-Aware CoT），在做出判断前生成多种情景，模拟金融分析师在不确定性下的推理方式。所提方法在FinTSR-Bench上达到了78.9%的平均准确率，显著优于LLM和TSRM基线。此外，我们证明了这四个能力类别通过联合训练具有互补性和相互增强性，并且情景感知思维链相比标准思维链持续提升了预测准确率。代码已在 https://github.com/seunghan96/FinSTaR 上开源。