In this paper we present a heuristic method to provide individual explanations for those elements in a dataset (data points) which are wrongly predicted by a given classifier. Since the general case is too difficult, in the present work we focus on faulty data from an underfitted model. First, we project the faulty data into a hand-crafted, and thus human readable, intermediate representation (meta-representation, profile vectors), with the aim of separating the two main causes of miss-classification: the classifier is not strong enough, or the data point belongs to an area of the input space where classes are not separable. Second, in the space of these profile vectors, we present a method to fit a meta-classifier (decision tree) and express its output as a set of interpretable (human readable) explanation rules, which leads to several target diagnosis labels: data point is either correctly classified, or faulty due to a too weak model, or faulty due to mixed (overlapped) classes in the input space. Experimental results on several real datasets show more than 80% diagnosis label accuracy and confirm that the proposed intermediate representation allows to achieve a high degree of invariance with respect to the classifier used in the input space and to the dataset being classified, i.e. we can learn the metaclassifier on a dataset with a given classifier and successfully predict diagnosis labels for a different dataset or classifier (or both).
翻译:本文提出一种启发式方法,用于为给定分类器预测错误的数据集元素(数据点)提供个体化解释。鉴于通用情形过于困难,本研究聚焦于欠拟合模型产生的错误数据。首先,我们将错误数据映射到手工构建且人类可读的中间表示(元表示、轮廓向量),旨在分离两类主要的误分类原因:分类器能力不足,或数据点位于输入空间中类别不可分的区域。其次,在轮廓向量空间中,我们提出一种拟合元分类器(决策树)的方法,并将其输出表达为一组可解释(人类可读)的解释规则,从而生成若干目标诊断标签:数据点被正确分类、因模型过弱而错误、或因输入空间中类别混合(重叠)而错误。在多个真实数据集上的实验结果显示,诊断标签准确率超过80%,并证实所提出的中间表示能够在输入空间中所用分类器及被分类数据集两方面实现高度不变性——即我们可在特定数据集上使用给定分类器学习元分类器,并成功预测不同数据集或不同分类器(或两者皆不同)的诊断标签。