We introduce a method for computing immediately human interpretable yet accurate classifiers from tabular data. The classifiers obtained are short Boolean formulas, computed via first discretizing the original data and then using feature selection coupled with a very fast algorithm for producing the best possible Boolean classifier for the setting. We demonstrate the approach via 13 experiments, obtaining results with accuracies comparable to ones obtained via random forests, XGBoost, and existing results for the same datasets in the literature. In most cases, the accuracy of our method is in fact similar to that of the reference methods, even though the main objective of our study is the immediate interpretability of our classifiers. We also prove a new result on the probability that the classifier we obtain from real-life data corresponds to the ideally best classifier with respect to the background distribution the data comes from.
翻译:我们提出一种从表格数据中计算即时人类可解释且高精度分类器的方法。所得分类器为简短的布尔公式,其计算过程首先对原始数据进行离散化处理,随后结合特征选择与一种极速算法来生成该设定下可能的最优布尔分类器。我们通过13组实验验证该方法,获得的分类准确率与随机森林、XGBoost及文献中相同数据集的现有结果相当。尽管本研究的主要目标是实现分类器的即时可解释性,但在大多数案例中,我们方法的准确率实际上与参考方法相近。我们还证明了一项新结果:关于从现实数据获得的分类器与数据来源背景分布下理想最优分类器相对应的概率。