Learning to Defer (L2D) enables a classifier to abstain from predictions and defer to an expert, and has recently been extended to multi-expert settings. In this work, we show that multi-expert L2D is fundamentally more challenging than the single-expert case. With multiple experts, the classifier's underfitting becomes inherent, which seriously degrades prediction performance, whereas in the single-expert setting it arises only under specific conditions. We theoretically reveal that this stems from an intrinsic expert identifiability issue: learning which expert to trust from a diverse pool, a problem absent in the single-expert case and renders existing underfitting remedies failed. To tackle this issue, we propose PiCCE (Pick the Confident and Correct Expert), a surrogate-based method that adaptively identifies a reliable expert based on empirical evidence. PiCCE effectively reduces multi-expert L2D to a single-expert-like learning problem, thereby resolving multi expert underfitting. We further prove its statistical consistency and ability to recover class probabilities and expert accuracies. Extensive experiments across diverse settings, including real-world expert scenarios, validate our theoretical results and demonstrate improved performance.
翻译:延迟学习(Learning to Defer,L2D)使分类器能够放弃预测并转交给专家处理,最近已扩展到多专家场景。本研究表明,多专家L2D在本质上比单专家情形更具挑战性。在多专家设置中,分类器的欠拟合成为固有缺陷,会严重降低预测性能;而在单专家设置中,欠拟合仅在特定条件下出现。我们从理论上揭示,这源于一个内在的专家可识别性问题:如何从多样化的专家池中学习信任哪位专家——该问题在单专家场景中不存在,并导致现有的欠拟合修正方法失效。为解决此问题,我们提出PiCCE(Pick the Confident and Correct Expert),这是一种基于代理的方法,能够根据经验证据自适应地识别可靠专家。PiCCE将多专家L2D有效简化为类单专家学习问题,从而解决了多专家欠拟合困境。我们进一步证明了该方法的统计一致性及其恢复类别概率与专家准确率的能力。通过涵盖真实专家场景在内的多样化实验,我们验证了理论结论并展示了性能提升效果。