Several explainable AI methods allow a Machine Learning user to get insights on the classification process of a black-box model in the form of local linear explanations. With such information, the user can judge which features are locally relevant for the classification outcome, and get an understanding of how the model reasons. Standard supervised learning processes are purely driven by the original features and target labels, without any feedback loop informed by the local relevance of the features identified by the post-hoc explanations. In this paper, we exploit this newly obtained information to design a feature engineering phase, where we combine explanations with feature values. To do so, we develop two different strategies, named Iterative Dataset Weighting and Targeted Replacement Values, which generate streamlined models that better mimic the explanation process presented to the user. We show how these streamlined models compare to the original black-box classifiers, in terms of accuracy and compactness of the newly produced explanations.
翻译:多种可解释人工智能方法允许机器学习用户通过局部线性解释的形式,洞察黑箱模型的分类过程。借助此类信息,用户可以判断哪些特征对分类结果具有局部相关性,并理解模型的推理机制。标准监督学习过程完全由原始特征与目标标签驱动,缺乏基于事后解释所识别特征局部相关性的反馈循环。本文利用这一新获取的信息设计特征工程阶段,将解释与特征值相结合。为此,我们开发了两种不同策略,即迭代数据集加权与针对性替换值,旨在生成更贴近用户所获解释过程的简化模型。我们从准确性与新生成解释的紧凑性两方面,展示了这些简化模型与原始黑箱分类器的对比效果。