Gene expression programming is an evolutionary optimization algorithm with the potential to generate interpretable and easily implementable equations for regression problems. Despite knowledge gained from previous optimizations being potentially available, the initial candidate solutions are typically generated randomly at the beginning and often only include features or terms based on preliminary user assumptions. This random initial guess, which lacks constraints on the search space, typically results in higher computational costs in the search for an optimal solution. Meanwhile, transfer learning, a technique to reuse parts of trained models, has been successfully applied to neural networks. However, no generalized strategy for its use exists for symbolic regression in the context of evolutionary algorithms. In this work, we propose an approach for integrating transfer learning with gene expression programming applied to symbolic regression. The constructed framework integrates Natural Language Processing techniques to discern correlations and recurring patterns from equations explored during previous optimizations. This integration facilitates the transfer of acquired knowledge from similar tasks to new ones. Through empirical evaluation of the extended framework across a range of univariate problems from an open database and from the field of computational fluid dynamics, our results affirm that initial solutions derived via a transfer learning mechanism enhance the algorithm's convergence rate towards improved solutions.
翻译:基因表达式编程是一种进化优化算法,具有为回归问题生成可解释且易于实现方程的潜力。尽管先前优化中获得的知识可能可用,但初始候选解通常在开始时随机生成,且往往仅包含基于用户初步假设的特征或项。这种缺乏搜索空间约束的随机初始猜测通常会导致在寻找最优解时产生更高的计算成本。与此同时,迁移学习作为一种重用已训练模型部分内容的技术,已在神经网络中成功应用。然而,在进化算法的符号回归背景下,尚不存在其使用的通用策略。在本工作中,我们提出了一种将迁移学习与应用于符号回归的基因表达式编程相结合的方法。所构建的框架集成了自然语言处理技术,以从先前优化过程中探索的方程中识别相关性和重复模式。这种集成促进了从相似任务到新任务的已获知识的迁移。通过对扩展框架在来自开放数据库和计算流体动力学领域的一系列单变量问题上进行实证评估,我们的结果证实,通过迁移学习机制推导出的初始解提高了算法向更优解的收敛速度。