In human-AI collaboration systems for critical applications, in order to ensure minimal error, users should set an operating point based on model confidence to determine when the decision should be delegated to human experts. Samples for which model confidence is lower than the operating point would be manually analysed by experts to avoid mistakes. Such systems can become truly useful only if they consider two aspects: models should be confident only for samples for which they are accurate, and the number of samples delegated to experts should be minimized. The latter aspect is especially crucial for applications where available expert time is limited and expensive, such as healthcare. The trade-off between the model accuracy and the number of samples delegated to experts can be represented by a curve that is similar to an ROC curve, which we refer to as confidence operating characteristic (COC) curve. In this paper, we argue that deep neural networks should be trained by taking into account both accuracy and expert load and, to that end, propose a new complementary loss function for classification that maximizes the area under this COC curve. This promotes simultaneously the increase in network accuracy and the reduction in number of samples delegated to humans. We perform experiments on multiple computer vision and medical image datasets for classification. Our results demonstrate that the proposed loss improves classification accuracy and delegates less number of decisions to experts, achieves better out-of-distribution samples detection and on par calibration performance compared to existing loss functions.
翻译:在人机协作的关键应用系统中,为确保最小化错误,用户需根据模型置信度设定操作点,以决定何时将决策委托给人类专家。模型置信度低于操作点的样本将由专家手动分析以避免错误。此类系统只有在兼顾以下两方面时才具有实际价值:模型仅对自身准确的样本保持高置信度,且委托给专家的样本数量应最小化。后者对于专家时间有限且成本高昂的应用(如医疗保健)尤为关键。模型准确率与委托专家样本数之间的权衡可通过类似ROC曲线的曲线表示,我们称之为置信操作特征(COC)曲线。本文提出,深度神经网络训练应同时考虑准确率与专家负荷,并为此设计一种新型互补损失函数,通过最大化COC曲线下面积来优化分类任务。该函数能同步提升网络准确率并减少人类干预的样本量。我们在多个计算机视觉与医学图像分类数据集上开展实验,结果表明:与现有损失函数相比,所提损失函数在提升分类准确率、减少委托专家决策数、改善分布外样本检测能力的同时,保持了相当的校准性能。