Personalized Interpretable Classification

How to interpret a data mining model has received much attention recently, because people may distrust a black-box predictive model if they do not understand how the model works. Hence, it will be trustworthy if a model can provide transparent illustrations on how to make the decision. Although many rule-based interpretable classification algorithms have been proposed, all these existing solutions cannot directly construct an interpretable model to provide personalized prediction for each individual test sample. In this paper, we make a first step towards formally introducing personalized interpretable classification as a new data mining problem to the literature. In addition to the problem formulation on this new issue, we present a greedy algorithm called PIC (Personalized Interpretable Classifier) to identify a personalized rule for each individual test sample. To improve the running efficiency, a fast approximate algorithm called fPIC is presented as well. To demonstrate the necessity, feasibility and advantages of such a personalized interpretable classification method, we conduct a series of empirical studies on real data sets. The experimental results show that: (1) The new problem formulation enables us to find interesting rules for test samples that may be missed by existing non-personalized classifiers. (2) Our algorithms can achieve the same-level predictive accuracy as those state-of-the-art (SOTA) interpretable classifiers. (3) On a real data set for predicting breast cancer metastasis, such personalized interpretable classifiers can outperform SOTA methods in terms of both accuracy and interpretability.

翻译：数据挖掘模型的可解释性近来备受关注，因为若无法理解模型的工作原理，人们可能不信任黑盒预测模型。因此，若模型能提供透明的决策过程说明，则将更具可信度。尽管已有许多基于规则的可解释分类算法被提出，但现有方案均无法直接构建可解释模型为每个测试样本提供个性化预测。本文首次在学术界正式提出"个性化可解释分类"这一新的数据挖掘问题。除给出该问题的形式化定义外，我们提出名为PIC（个性化可解释分类器）的贪心算法，为每个测试样本识别个性化规则。为提升运行效率，同时提出快速近似算法fPIC。为验证这种个性化可解释分类方法的必要性、可行性和优势，我们在真实数据集上开展系列实证研究。实验结果表明：（1）新问题框架能发现测试样本中有趣的规则，这些规则可能被现有非个性化分类器遗漏；（2）我们的算法能达到与当前最先进可解释分类器相当的预测精度；（3）在预测乳腺癌转移的真实数据集上，此类个性化可解释分类器在准确性和可解释性方面均优于最先进方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日