Interactive Decision Tree Creation and Enhancement with Complete Visualization for Explainable Modeling

To increase the interpretability and prediction accuracy of the Machine Learning (ML) models, visualization of ML models is a key part of the ML process. Decision Trees (DTs) are essential in machine learning (ML) because they are used to understand many black box ML models including Deep Learning models. In this research, two new methods for creation and enhancement with complete visualizing Decision Trees as understandable models are suggested. These methods use two versions of General Line Coordinates (GLC): Bended Coordinates (BC) and Shifted Paired Coordinates (SPC). The Bended Coordinates are a set of line coordinates, where each coordinate is bended in a threshold point of the respective DT node. In SPC, each n-D point is visualized in a set of shifted pairs of 2-D Cartesian coordinates as a directed graph. These new methods expand and complement the capabilities of existing methods to visualize DT models more completely. These capabilities allow us to observe and analyze: (1) relations between attributes, (2) individual cases relative to the DT structure, (3) data flow in the DT, (4) sensitivity of each split threshold in the DT nodes, and (5) density of cases in parts of the n-D space. These features are critical for DT models' performance evaluation and improvement by domain experts and end users as they help to prevent overgeneralization and overfitting of the models. The advantages of this methodology are illustrated in the case studies on benchmark real-world datasets. The paper also demonstrates how to generalize them for decision tree visualizations in different General Line Coordinates.

翻译：为提升机器学习(ML)模型的可解释性与预测精度，ML模型的可视化是机器学习流程中的关键环节。决策树(DT)在机器学习中至关重要，因其常被用于理解包括深度学习模型在内的诸多黑箱模型。本研究提出了两种新方法，旨在将决策树创建为可理解模型并完成其完整可视化增强。这两种方法分别采用通用线坐标(GLC)的两个变体：弯曲坐标(BC)与移位配对坐标(SPC)。弯曲坐标是一组线坐标，其中每条坐标在对应决策树节点的阈值点处发生弯曲；而在SPC中，每个n维点被可视化为二维笛卡尔坐标移位配对集合中的有向图。这些新方法扩展并补充了现有方法对决策树模型进行更完整可视化的能力，使研究者能够观察与分析：(1)属性间关系；(2)相对于决策树结构的个例分布；(3)决策树中的数据流；(4)决策树节点中各分割阈值的敏感性；(5)n维空间中个案分布的密集程度。这些特性对领域专家和终端用户评估及改进决策树模型性能至关重要，有助于防止模型的过度泛化与过拟合。本方法的优势已在基准真实数据集案例研究中得到验证。论文同时展示了如何将其推广至不同通用线坐标体系下的决策树可视化。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

128+阅读 · 2019年12月13日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日