Sparse Latent Factor Forecaster (SLFF) with Iterative Inference for Transparent Multi-Horizon Commodity Futures Prediction

Amortized variational inference in latent-variable forecasters creates a deployment gap: the test-time encoder approximates a training-time optimization-refined latent, but without access to future targets. This gap introduces unnecessary forecast error and interpretability challenges. In this work, we propose the Sparse Latent Factor Forecaster with Iterative Inference (SLFF), addressing this through (i) a sparse coding objective with L1 regularization for low-dimensional latents, (ii) unrolled proximal gradient descent (LISTA-style) for iterative refinement during training, and (iii) encoder alignment to ensure amortized outputs match optimization-refined solutions. Under a linearized decoder assumption, we derive a design-motivating bound on the amortization gap based on encoder-optimizer distance, with convergence rates under mild conditions; empirical checks confirm the bound is predictive for the deployed MLP decoder. To prevent mixed-frequency data leakage, we introduce an information-set-aware protocol using release calendars and vintage macroeconomic data. Interpretability is formalized via a three-stage protocol: stability (Procrustes alignment across seeds), driver validity (held-out regressions against observables), and behavioral consistency (counterfactuals and event studies). Using commodity futures (Copper, WTI, Gold; 2005--2025) as a testbed, SLFF demonstrates significant improvements over neural baselines at 1- and 5-day horizons, yielding sparse factors that are stable across seeds and correlated with observable economic fundamentals (interpretability remains correlational, not causal). Code, manifests, diagnostics, and artifacts are released.

翻译：在潜在变量预测器中，摊销变分推理会形成部署差距：测试时的编码器近似于训练时经优化精炼的潜在变量，但无法访问未来目标。这一差距引入了不必要的预测误差和可解释性挑战。在本研究中，我们提出了具有迭代推理的稀疏潜在因子预测器（SLFF），通过以下方式解决该问题：(i) 采用带L1正则化的稀疏编码目标以获取低维潜在变量，(ii) 在训练期间使用展开近端梯度下降（LISTA风格）进行迭代精炼，以及(iii) 编码器对齐以确保摊销输出匹配优化精炼的解。在线性化解码器假设下，我们推导出一个基于编码器-优化器距离的设计动机性摊销差距上界，并在温和条件下给出收敛速率；实证检验证实该上界对于部署的MLP解码器具有预测性。为防止混合频率数据泄露，我们引入了一种利用发布日历和历史宏观经济数据的信息集感知协议。可解释性通过三阶段协议形式化：稳定性（跨种子的Procrustes对齐）、驱动因子有效性（针对可观测变量的留出回归）和行为一致性（反事实与事件研究）。以商品期货（铜、WTI原油、黄金；2005–2025年）为测试平台，SLFF在1日和5日预测期限上展现出相对于神经基线的显著改进，生成的稀疏因子在跨种子时保持稳定，并与可观测的经济基本面相关（可解释性仍为相关性，而非因果性）。代码、清单、诊断工具及实验产物均已发布。