Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite being oblivious to the structure of the cascade -- e.g., not modelling the errors of downstream models -- such confidence-based deferral often works remarkably well in practice. In this paper, we seek to better understand the conditions under which confidence-based deferral may fail, and when alternate deferral strategies can perform better. We first present a theoretical characterisation of the optimal deferral rule, which precisely characterises settings under which confidence-based deferral may suffer. We then study post-hoc deferral mechanisms, and demonstrate they can significantly improve upon confidence-based deferral in settings where (i) downstream models are specialists that only work well on a subset of inputs, (ii) samples are subject to label noise, and (iii) there is distribution shift between the train and test set.
翻译:级联是一种经典策略,能够使推理成本根据不同样本自适应变化,其通过依次调用一系列分类器实现。推延规则决定了是否继续调用序列中的下一个分类器,或是终止预测。一种简单的推延规则采用当前分类器的置信度(例如基于最大预测Softmax概率)。尽管这种基于置信度的推延方法对级联结构(如下游模型的错误模式)缺乏建模,但在实践中往往表现出色。本文旨在深入探讨基于置信度的推延可能在何种条件下失效,以及替代性推延策略何时能取得更优效果。我们首先从理论上刻画了最优推延规则,精确厘清了基于置信度的推延可能表现欠佳的场景。随后研究了事后推延机制,并证明在以下情形中,该方法能显著优于基于置信度的推延:(i)下游模型为仅对特定输入子集有效的专精模型,(ii)样本存在标签噪声,以及(iii)训练集与测试集之间存在分布偏移。