Decision trees offer the benefit of easy interpretation because they allow the classification of input data based on if--then rules. However, as decision trees are constructed by an algorithm that achieves clear classification with minimum necessary rules, the trees possess the drawback of extracting only minimum rules, even when various latent rules exist in data. Approaches that construct multiple trees using randomly selected feature subsets do exist. However, the number of trees that can be constructed remains at the same scale because the number of feature subsets is a combinatorial explosion. Additionally, when multiple trees are constructed, numerous rules are generated, of which several are untrustworthy and/or highly similar. Therefore, we propose "MAABO-MT" and "GS-MRM" algorithms that strategically construct trees with high estimation performance among all possible trees with small computational complexity and extract only reliable and non-similar rules, respectively. Experiments are conducted using several open datasets to analyze the effectiveness of the proposed method. The results confirm that MAABO-MT can discover reliable rules at a lower computational cost than other methods that rely on randomness. Furthermore, the proposed method is confirmed to provide deeper insights than single decision trees commonly used in previous studies. Therefore, MAABO-MT and GS-MRM can efficiently extract rules from combinatorially exploded decision trees.
翻译:决策树因其基于if-then规则对输入数据进行分类的能力,具有易于解释的优势。然而,由于决策树由能够以最少必要规则实现清晰分类的算法构建,即便数据中存在多种潜在规则,其仍存在仅能提取最少规则这一缺陷。目前存在使用随机选取的特征子集构建多棵决策树的方法,但受限于特征子集数量呈组合爆炸式增长,可构建的决策树规模始终有限。此外,当构建多棵决策树时会产生大量规则,其中部分规则存在不可靠性和/或高度相似性问题。为此,我们提出"MAABO-MT"和"GS-MRM"算法:前者能以较低计算复杂度在所有可能的决策树中策略性地构建具有高估计性能的决策树,后者则能仅提取可靠且非相似的规则。通过使用多个公开数据集进行实验,分析所提方法的有效性。结果表明,与依赖随机性的其他方法相比,MAABO-MT能够以更低计算成本发现可靠规则。此外,证实所提方法相比以往研究中常用的单棵决策树能提供更深入的分析结果。因此,MAABO-MT与GS-MRM能够从组合爆炸式决策树中高效提取规则。