Given the increasing interest in interpretable machine learning, classification trees have again attracted the attention of the scientific community because of their glass-box structure. These models are usually built using greedy procedures, solving subproblems to find cuts in the feature space that minimize some impurity measures. In contrast to this standard greedy approach and to the recent advances in the definition of the learning problem through MILP-based exact formulations, in this paper we propose a novel evolutionary algorithm for the induction of classification trees that exploits a memetic approach that is able to handle datasets with thousands of points. Our procedure combines the exploration of the feasible space of solutions with local searches to obtain structures with generalization capabilities that are competitive with the state-of-the-art methods.
翻译:随着可解释机器学习日益受到关注,分类树因其透明结构再次吸引了科学界的兴趣。这些模型通常采用贪心策略构建,通过求解子问题来寻找特征空间中的切分点,以最小化某些不纯度度量。与这种标准贪心方法以及近期基于MILP精确公式定义学习问题的进展不同,本文提出一种新型进化算法用于分类树的归纳学习,该算法利用模因策略,能够处理包含数千个样本的数据集。我们的方法将可行解空间的探索与局部搜索相结合,从而获得具有泛化能力的结构,其性能可与当前最先进的方法相媲美。