Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert, one may also choose what additional information that expert should receive, such as retrieved documents, tool outputs, or escalation context. We study this problem and call it Learning-to-Defer with advice. We show that a broad family of natural separated surrogates, which learn routing and advice with distinct heads, is inconsistent even in the smallest non-trivial setting. We then introduce an augmented surrogate that operates on the composite expert--advice action space and prove an $\mathcal{H}$-consistency guarantee together with an excess-risk transfer bound, yielding recovery of the Bayes-optimal policy in the limit. Experiments on tabular, language, and multi-modal tasks show that the resulting method improves over standard Learning-to-Defer while adapting its advice-acquisition behavior to the cost regime; a synthetic benchmark confirms the failure mode predicted for separated surrogates.
翻译:延迟学习(Learning-to-Defer)将每个输入路由至预期成本最小的专家,但该方法假设在决策时刻每个专家可获取的信息是固定的。然而许多现代系统违背了这一假设:选定专家后,人们还可以选择该专家应接收的附加信息(如检索文档、工具输出或升级上下文)。我们研究了这一问题,并将其称为"带建议的延迟学习"。研究表明,即使是在最简单的非平凡场景中,一类广泛存在的、通过独立头部分别学习路由与建议的自然分离替代函数也不具有一致性。随后,我们引入了一种在复合专家-建议动作空间上运行的增强型替代函数,证明了其$\mathcal{H}$-一致性保证以及超额风险转移边界,从而在极限情况下恢复贝叶斯最优策略。在表格、语言和多模态任务上的实验表明,该方法在适应成本机制调整建议获取行为的同时,相较标准延迟学习有显著提升;合成基准测试证实了分离替代函数预期的失效模式。