Two fundamental requirements for the deployment of machine learning models in safety-critical systems are to be able to detect out-of-distribution (OOD) data correctly and to be able to explain the prediction of the model. Although significant effort has gone into both OOD detection and explainable AI, there has been little work on explaining why a model predicts a certain data point is OOD. In this paper, we address this question by introducing the concept of an OOD counterfactual, which is a perturbed data point that iteratively moves between different OOD categories. We propose a method for generating such counterfactuals, investigate its application on synthetic and benchmark data, and compare it to several benchmark methods using a range of metrics.
翻译:安全关键系统中部署机器学习模型的两个基本要求是:能够正确检测分布外(OOD)数据,并能够解释模型的预测结果。尽管OOD检测与可解释人工智能领域已投入大量研究工作,但针对模型为何将特定数据点判定为OOD的解释性研究仍十分匮乏。本文通过引入OOD反事实概念来解决该问题——OOD反事实是指在不同OOD类别间渐进移动的扰动数据点。我们提出了一种生成此类反事实的方法,在合成数据与基准数据上验证其应用效果,并通过多项评估指标与多种基准方法进行对比分析。