To address the issues of stability and fidelity in interpretable learning, a novel interpretable methodology, ensemble interpretation, is presented in this paper which integrates multi-perspective explanation of various interpretation methods. On one hand, we define a unified paradigm to describe the common mechanism of different interpretation methods, and then integrate the multiple interpretation results to achieve more stable explanation. On the other hand, a supervised evaluation method based on prior knowledge is proposed to evaluate the explaining performance of an interpretation method. The experiment results show that the ensemble interpretation is more stable and more consistent with human experience and cognition. As an application, we use the ensemble interpretation for feature selection, and then the generalization performance of the corresponding learning model is significantly improved.
翻译:针对可解释学习中存在的稳定性与忠实性问题,本文提出了一种名为集成解释的新型可解释方法,该方法融合了多种解释方法的多视角阐述。一方面,我们定义了一个统一范式来描述不同解释方法的共同机制,进而整合多个解释结果以实现更稳定的解释;另一方面,提出了一种基于先验知识的监督评估方法来衡量解释方法的解释性能。实验结果表明,集成解释具有更强的稳定性,更符合人类经验与认知。作为应用实例,我们将集成解释用于特征选择,相应学习模型的泛化性能因此得到显著提升。