Learning-to-Defer (L2D) enables hybrid decision-making by routing inputs either to a predictor or to external experts. While promising, L2D is highly vulnerable to adversarial perturbations, which can not only flip predictions but also manipulate deferral decisions. Prior robustness analyses focus solely on two-stage settings, leaving open the end-to-end (one-stage) case where predictor and allocation are trained jointly. We introduce the first framework for adversarial robustness in one-stage L2D, covering both classification and regression. Our approach formalizes attacks, proposes cost-sensitive adversarial surrogate losses, and establishes theoretical guarantees including $\mathcal{H}$, $(\mathcal{R }, \mathcal{F})$, and Bayes consistency. Experiments on benchmark datasets confirm that our methods improve robustness against untargeted and targeted attacks while preserving clean performance.
翻译:学习延迟(L2D)通过将输入路由至预测器或外部专家来实现混合决策。尽管前景广阔,L2D极易受到对抗性扰动的影响,这些扰动不仅能翻转预测结果,还能操纵延迟决策。先前的鲁棒性分析仅关注两阶段设置,而忽略了预测器与分配器联合训练的端到端(单阶段)情形。我们首次提出了单阶段L2D的对抗鲁棒性框架,涵盖分类与回归任务。该方法形式化攻击模式,提出成本敏感的对抗代理损失函数,并建立了包括$\mathcal{H}$、$(\mathcal{R }, \mathcal{F})$及贝叶斯一致性在内的理论保证。在基准数据集上的实验证实,我们的方法在保持清洁性能的同时,显著提升了针对无目标与有目标攻击的鲁棒性。