Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert, one may also choose what additional information that expert should receive, such as retrieved documents, tool outputs, or escalation context. We study this problem and call it Learning-to-Defer with advice. We show that a broad family of natural separated surrogates, which learn routing and advice with distinct heads, is inconsistent even in the smallest non-trivial setting. We then introduce an augmented surrogate that operates on the composite expert--advice action space and prove an $\mathcal{H}$-consistency guarantee together with an excess-risk transfer bound, yielding recovery of the Bayes-optimal policy in the limit. Experiments on tabular, language, and multi-modal tasks show that the resulting method improves over standard Learning-to-Defer while adapting its advice-acquisition behavior to the cost regime; a synthetic benchmark confirms the failure mode predicted for separated surrogates.
翻译:学习-延迟决策(Learning-to-Defer)将每个输入分配给使期望成本最小化的专家,但其假设决策时每个专家可获得的信息是固定的。许多现代系统违背了这一假设:在选择专家后,还可以选择该专家应接收哪些额外信息,例如检索文档、工具输出或升级上下文。我们研究这一问题,并将其称为带建议的学习-延迟决策(Learning-to-Defer with Advice)。我们证明,一类广泛的自然分离代理(即通过独立头部分别学习路由和建议)即使在最简单的非平凡场景中也存在不一致性。随后,我们引入一种基于复合专家-建议行动空间的增强代理,并证明了其$\mathcal{H}$一致性保证,同时给出超额风险转移界,从而在极限情况下恢复贝叶斯最优策略。在表格数据、语言和多模态任务上的实验表明,所提方法在适应成本机制调整建议获取行为的同时,优于标准学习-延迟决策;合成基准测试证实了分离代理预测的失败模式。