We study Learning to Defer for non-stationary time series with partial feedback and time-varying expert availability. At each time step, the router selects an available expert, observes the target, and sees only the queried expert's prediction. We model signed expert residuals using L2D-SLDS, a factorized switching linear-Gaussian state-space model with context-dependent regime transitions, a shared global factor enabling cross-expert information transfer, and per-expert idiosyncratic states. The model supports expert entry and pruning via a dynamic registry. Using one-step-ahead predictive beliefs, we propose an IDS-inspired routing rule that trades off predicted cost against information gained about the latent regime and shared factor. Experiments show improvements over contextual-bandit baselines and a no-shared-factor ablation.
翻译:本研究探讨了在部分反馈和时变专家可用性条件下的非平稳时间序列学习延迟决策问题。在每个时间步,路由器选择一个可用专家,观测目标值,并仅获取被查询专家的预测结果。我们采用L2D-SLDS模型对专家残差符号进行建模——该模型是一种具有上下文相关状态转换机制的因子化切换线性高斯状态空间模型,其包含实现跨专家信息传递的共享全局因子,以及各专家特有的异质性状态。该模型通过动态注册机制支持专家的动态加入与剪枝。基于一步超前预测置信度,我们提出一种受信息导向采样启发的路由规则,该规则在预测成本与潜在状态及共享因子的信息获取之间进行权衡。实验表明,该方法在上下文多臂赌博机基准模型及无共享因子消融实验中均取得性能提升。