This work uses visual knowledge discovery in parallel coordinates to advance methods of interpretable machine learning. The graphic data representation in parallel coordinates made the concepts of hypercubes and hyperblocks (HBs) simple to understand for end users. It is suggested to use mixed and pure hyperblocks in the proposed data classifier algorithm Hyper. It is shown that Hyper models generalize decision trees. The algorithm is presented in several settings and options to discover interactively or automatically overlapping or non-overlapping hyperblocks. Additionally, the use of hyperblocks in conjunction with language descriptions of visual patterns is demonstrated. The benchmark data from the UCI ML repository were used to evaluate the Hyper algorithm. It enabled the discovery of mixed and pure HBs evaluated using 10-fold cross validation. Connections among hyperblocks, dimension reduction and visualization have been established. The capability of end users to find and observe hyperblocks, as well as the ability of side-by-side visualizations to make patterns evident, are among major advantages ofhyperblock technology and the Hyper algorithm. A new method to visualize incomplete n-D data with missing values is proposed, while the traditional parallel coordinates do not support it. The ability of HBs to better prevent both overgeneralization and overfitting of data over decision trees is demonstrated as another benefit of the hyperblocks. The features of VisCanvas 2.0 software tool that implements Hyper technology are presented.
翻译:本研究利用平行坐标中的可视化知识发现方法,推进可解释机器学习技术。平行坐标中的图形数据表示使得超立方体与超块(HB)的概念易于终端用户理解。提出的数据分类算法Hyper建议采用混合型与纯型超块。研究表明,Hyper模型可泛化决策树方法。算法支持多种设置与选项,能够交互式或自动发现重叠与非重叠超块。此外,本文展示了将超块与视觉模式的语言描述相结合的用法。基于UCI机器学习库的基准数据对Hyper算法进行评估,该算法通过十折交叉验证发现了混合型与纯型超块。建立了超块、降维与可视化之间的联系。终端用户发现并观察超块的能力,以及并排可视化突出模式的特性,是超块技术与Hyper算法的主要优势。针对传统平行坐标无法处理含缺失值的不完整n维数据,本文提出了一种新型可视化方法。研究证明,超块在防止数据过泛化与过拟合方面优于决策树,这是超块的另一优势。本文还介绍了实现Hyper技术的VisCanvas 2.0软件工具功能。