The learning to defer (L2D) framework allows autonomous systems to be safe and robust by allocating difficult decisions to a human expert. All existing work on L2D assumes that each expert is well-identified, and if any expert were to change, the system should be re-trained. In this work, we alleviate this constraint, formulating an L2D system that can cope with never-before-seen experts at test-time. We accomplish this by using meta-learning, considering both optimization- and model-based variants. Given a small context set to characterize the currently available expert, our framework can quickly adapt its deferral policy. For the model-based approach, we employ an attention mechanism that is able to look for points in the context set that are similar to a given test point, leading to an even more precise assessment of the expert's abilities. In the experiments, we validate our methods on image recognition, traffic sign detection, and skin lesion diagnosis benchmarks.
翻译:学习推迟决策(L2D)框架通过将困难决策分配给人类专家,使自主系统能够安全且鲁棒。现有所有L2D工作均假设每个专家身份明确,且若任何专家发生变化,系统需重新训练。本研究放宽了这一约束,提出一种能在测试阶段应对从未见过的专家的L2D系统。我们通过元学习实现这一目标,同时考虑了基于优化和基于模型两种变体。给定一个描述当前可用专家的小规模上下文集合,我们的框架能快速调整其推迟策略。对于基于模型的方法,我们采用注意力机制,该机制能从上下文集合中寻找与给定测试点相似的样本,从而更精准地评估专家能力。在实验中,我们在图像识别、交通标志检测和皮肤病变诊断基准上验证了所提方法。