We introduce a method for computing immediately human interpretable yet accurate classifiers from tabular data. The classifiers obtained are short DNF-formulas, computed via first discretizing the original data to Boolean form and then using feature selection coupled with a very fast algorithm for producing the best possible Boolean classifier for the setting. We demonstrate the approach via 14 experiments, obtaining results with accuracies mainly similar to ones obtained via random forests, XGBoost, and existing results for the same datasets in the literature. In several cases, our approach in fact outperforms the reference results in relation to accuracy, even though the main objective of our study is the immediate interpretability of our classifiers. We also prove a new result on the probability that the classifier we obtain from real-life data corresponds to the ideally best classifier with respect to the background distribution the data comes from.
翻译:我们提出一种从表格数据中计算即时可解释且准确的分类器的方法。所得分类器为短析取范式公式,通过先将原始数据离散化为布尔形式,再结合特征选择与一种快速算法生成该场景下最优布尔分类器而获得。通过14项实验验证该方法,所得结果在准确性上主要与随机森林、XGBoost及文献中针对相同数据集的现有结果相近。尽管本研究首要目标是分类器的即时可解释性,但在多个案例中,我们的方法在准确性上实际超越了参考结果。此外,我们还证明了一个新结论:从真实数据中获得的分类器对应数据背景分布下的理想最优分类器的概率。