The problem of learning to defer with multiple experts consists of optimally assigning input instances to experts, balancing the trade-off between their accuracy and computational cost. This is a critical challenge in natural language generation, but also in other fields such as image processing, and medical diagnostics. Recent studies have proposed surrogate loss functions to optimize deferral, but challenges remain in ensuring their consistency properties. This paper introduces novel surrogate loss functions and efficient algorithms with strong theoretical learning guarantees. We address open questions regarding realizable $H$-consistency, $H$-consistency bounds, and Bayes-consistency for both single-stage (jointly learning predictor and deferral function) and two-stage (learning only the deferral function with a fixed expert) learning scenarios. For single-stage deferral, we introduce a family of new realizable $H$-consistent surrogate losses and further prove $H$-consistency for a selected member. For two-stage deferral, we derive new surrogate losses that achieve realizable $H$-consistency, $H$-consistency bounds, and Bayes-consistency for the two-expert scenario and, under natural assumptions, multiple-expert scenario. Additionally, we provide enhanced theoretical guarantees under low-noise assumptions for both scenarios. Finally, we report the results of experiments using our proposed surrogate losses, comparing their performance against existing baselines.
翻译:多专家学习延迟问题旨在将输入实例最优分配给专家,平衡其准确性与计算成本之间的权衡。这是自然语言生成领域的关键挑战,同时也存在于图像处理、医疗诊断等其他领域。近期研究提出了替代损失函数以优化延迟决策,但在确保其一致性特性方面仍存在挑战。本文引入了具有强理论学习保证的新型替代损失函数与高效算法。我们针对单阶段(联合学习预测器与延迟函数)和两阶段(仅学习延迟函数且专家固定)学习场景,解决了关于可实现$H$一致性、$H$一致性边界及贝叶斯一致性的开放性问题。对于单阶段延迟,我们提出了一族新的可实现$H$一致性的替代损失函数,并进一步证明了其中特定成员的$H$一致性。对于两阶段延迟,我们推导出新的替代损失函数,在双专家场景及自然假设下的多专家场景中实现了可实现$H$一致性、$H$一致性边界及贝叶斯一致性。此外,我们在低噪声假设下为两种场景提供了增强的理论保证。最后,我们报告了使用所提替代损失函数的实验结果,并将其性能与现有基线方法进行了比较。