Deploying machine learning models into sensitive domains in our society requires these models to be explainable. Genetic Programming (GP) can offer a way to evolve inherently interpretable expressions. GP-GOMEA is a form of GP that has been found particularly effective at evolving expressions that are accurate yet of limited size and, thus, promote interpretability. Despite this strength, a limitation of GP-GOMEA is template-based. This negatively affects its scalability regarding the arity of operators that can be used, since with increasing operator arity, an increasingly large part of the template tends to go unused. In this paper, we therefore propose two enhancements to GP-GOMEA: (i) semantic subtree inheritance, which performs additional variation steps that consider the semantic context of a subtree, and (ii) greedy child selection, which explicitly considers parts of the template that in standard GP-GOMEA remain unused. We compare different versions of GP-GOMEA regarding search enhancements on a set of continuous and discontinuous regression problems, with varying tree depths and operator sets. Experimental results show that both proposed search enhancements have a generally positive impact on the performance of GP-GOMEA, especially when the set of operators to choose from is large and contains higher-arity operators.
翻译:将机器学习模型部署到社会敏感领域需要这些模型具备可解释性。遗传编程(GP)提供了一种演化出天然可解释表达式的方法。GP-GOMEA是一种已被证明特别有效的GP形式,能够演化出既准确又规模有限、从而促进可解释性的表达式。尽管具有这一优势,GP-GOMEA的局限性在于其基于模板的特性。这对其在运算符元数(arity)方面的可扩展性产生了负面影响,因为随着运算符元数增加,模板中未使用的部分会越来越多。为此,本文提出两种GP-GOMEA的改进方案:(i)语义子树继承,通过考虑子树的语义上下文执行额外的变异步骤;(ii)贪心子代选择,显式考虑标准GP-GOMEA中未被使用的模板部分。我们在包含不同树深和运算符集的连续与不连续回归问题上,比较了不同版本GP-GOMEA的搜索增强效果。实验结果表明,两种改进方案对GP-GOMEA的性能普遍具有积极影响,尤其在运算符集规模较大且包含高阶运算符时效果更为显著。