Computational cost in metaheuristics such as Evolutionary Algorithms (EAs) is often a major concern, particularly with their ability to scale. In data-based training, traditional EAs typically use a significant portion, if not all, of the dataset for model training and fitness evaluation in each generation. This makes EAs suffer from high computational costs incurred during the fitness evaluation of the population, particularly when working with large datasets. To mitigate this issue, we propose a Machine Learning (ML)-driven Distance-based Selection (DBS) algorithm that reduces the fitness evaluation time by optimizing test cases. We test our algorithm by applying it to 24 benchmark problems from Symbolic Regression (SR) and digital circuit domains and then using Grammatical Evolution (GE) to train models using the reduced dataset. We use GE to test DBS on SR and produce a system flexible enough to test it on digital circuit problems further. The quality of the solutions is tested and compared against the conventional training method to measure the coverage of training data selected using DBS, i.e., how well the subset matches the statistical properties of the entire dataset. Moreover, the effect of optimized training data on run time and the effective size of the evolved solutions is analyzed. Experimental and statistical evaluations of the results show our method empowered GE to yield superior or comparable solutions to the baseline (using the full datasets) with smaller sizes and demonstrates computational efficiency in terms of speed.
翻译:元启发式算法(如进化算法)的计算成本通常是其可扩展性的主要瓶颈。在基于数据的训练中,传统进化算法通常会在每一代使用大部分甚至全部数据集进行模型训练与适应度评估。这使得进化算法因种群适应度评估而产生高计算开销,尤其是处理大规模数据集时更为突出。为解决这一问题,我们提出了一种基于机器学习(ML)驱动的距离选择算法,通过优化测试用例来降低适应度评估时间。我们将该算法应用于符号回归和数字电路领域的24个基准问题进行验证,并采用语法演化(GE)利用精简数据集训练模型。通过GE在符号回归问题上的应用测试DBS算法,并构建了足够灵活的框架以便进一步在数字电路问题上进行测试。我们对比分析了基于DBS选择训练数据的解质量与传统训练方法,衡量所选子集对完整数据集统计特性的覆盖程度。此外,还分析了优化训练数据对运行时间的影响以及演化解的有效规模。实验与统计评估结果表明:该方法使GE能够在保持与基线方法(使用完整数据集)相当或更优解质量的前提下,获得更精简的解,并展现出计算效率上的速度优势。