GP-GOMEA is a state-of-the-art evolutionary algorithm for symbolic regression, known for discovering small and interpretable models. However, its computational cost remains substantial, limiting its applicability to larger datasets and more complex target expressions. In contrast, the rise of modern subsymbolic approaches, particularly deep learning, has been driven largely by the massive parallelism offered by GPUs. In this work, we take the first major step toward a fully GPU-accelerated GP-GOMEA by introducing a GPU-based fitness evaluation scheme. We design a GPU-friendly representation of GP-GOMEA's template-based individuals and a corresponding evaluation strategy that exploits the inherent parallelism of population-based search. This substantially increases evaluation throughput, enabling orders of magnitude more evaluations within the same time budget. Across four standard symbolic regression benchmarks, this increased evaluation capacity yields performance improvements, particularly for larger datasets and larger population sizes. Moreover, the ability to efficiently evaluate much larger datasets and more complex templates enables analyses that were previously infeasible, allowing us to systematically analyze what makes expressions increasingly difficult for GP-GOMEA, providing new insights into how expression structure affects search difficulty. Finally, for the first time, this expanded capability allows a problem-agnostic evolutionary algorithm to reliably regress one of the largest Feynman equations within four hours.
翻译:GP-GOMEA是一种先进的符号回归进化算法,以发现简洁且可解释的模型而著称。然而,其计算成本仍然较高,限制了其在更大数据集和更复杂目标表达式上的应用。相比之下,现代亚符号方法(尤其是深度学习)的兴起主要得益于GPU提供的海量并行计算能力。在本工作中,我们通过引入基于GPU的适应度评估方案,向完全GPU加速的GP-GOMEA迈出了第一步。我们设计了适用于GPU的GP-GOMEA模板化个体表示方法,以及相应的评估策略,充分利用种群搜索的内在并行性。这显著提升了评估吞吐量,使得在相同时间预算内可完成数个数量级更多的评估。在四个标准符号回归基准测试中,这种增强的评估能力带来了性能提升,尤其是在处理更大数据集和更大种群规模时。此外,高效评估更大数据集和更复杂模板的能力使得先前不可行的分析成为可能,我们得以系统分析哪些因素使表达式对GP-GOMEA更具挑战性,从而为表达式结构如何影响搜索难度提供了新见解。最后,这种扩展能力首次使得一种与问题无关的进化算法能够在四小时内可靠地回归出最大的费曼方程之一。