Counterfactually Comparing Abstaining Classifiers

from arxiv, Accepted to NeurIPS 2023. Preliminary work presented at the ICML 2023 Workshop on Counterfactuals in Minds and Machines. Code available at https://github.com/yjchoe/ComparingAbstainingClassifiers

Abstaining classifiers have the option to abstain from making predictions on inputs that they are unsure about. These classifiers are becoming increasingly popular in high-stakes decision-making problems, as they can withhold uncertain predictions to improve their reliability and safety. When evaluating black-box abstaining classifier(s), however, we lack a principled approach that accounts for what the classifier would have predicted on its abstentions. These missing predictions matter when they can eventually be utilized, either directly or as a backup option in a failure mode. In this paper, we introduce a novel approach and perspective to the problem of evaluating and comparing abstaining classifiers by treating abstentions as missing data. Our evaluation approach is centered around defining the counterfactual score of an abstaining classifier, defined as the expected performance of the classifier had it not been allowed to abstain. We specify the conditions under which the counterfactual score is identifiable: if the abstentions are stochastic, and if the evaluation data is independent of the training data (ensuring that the predictions are missing at random), then the score is identifiable. Note that, if abstentions are deterministic, then the score is unidentifiable because the classifier can perform arbitrarily poorly on its abstentions. Leveraging tools from observational causal inference, we then develop nonparametric and doubly robust methods to efficiently estimate this quantity under identification. Our approach is examined in both simulated and real data experiments.

翻译：弃权分类器具备在不确定输入上放弃预测的选项。这类分类器在高风险决策问题中日益普及，因为它们可以通过保留不确定预测来提高可靠性和安全性。然而，在评估黑箱弃权分类器时，我们缺乏一种能够说明分类器对其放弃预测案例会作何预测的原则性方法。当这些缺失的预测最终可被利用时（无论是直接使用还是作为故障模式的后备选项），它们便具有重要影响。本文通过将弃权视为缺失数据，引入了一种评估与比较弃权分类器的新方法及新视角。我们的评估方法核心在于定义弃权分类器的反事实得分，即若分类器不被允许弃权时的预期性能。我们明确了反事实得分可识别的条件：若弃权是随机的，且评估数据独立于训练数据（确保预测随机缺失），则得分可识别。值得注意的是，若弃权是确定性的，则得分不可识别，因为分类器在弃权案例上的表现可能任意差。借助观察性因果推断的工具，我们随后在可识别条件下开发了非参数和双重稳健方法，以高效估计这一指标。我们在模拟实验和真实数据实验中检验了该方法。