Parent selection plays an important role in evolutionary algorithms, and many strategies exist to select the parent pool before breeding the next generation. Methods often rely on average error over the entire dataset as a criterion to select the parents, which can lead to an information loss due to aggregation of all test cases. Under epsilon-lexicase selection, the population goes to a selection pool that is iteratively reduced by using each test individually, discarding individuals with an error higher than the elite error plus the median absolute deviation (MAD) of errors for that particular test case. In an attempt to better capture differences in performance of individuals on cases, we propose a new criteria that splits errors into two partitions that minimize the total variance within partitions. Our method was embedded into the FEAT symbolic regression algorithm, and evaluated with the SRBench framework, containing 122 black-box synthetic and real-world regression problems. The empirical results show a better performance of our approach compared to traditional epsilon-lexicase selection in the real-world datasets while showing equivalent performance on the synthetic dataset.
翻译:父代选择在进化算法中起着重要作用,在培育下一代前存在多种策略用于选择父代池。方法通常依赖于整个数据集上的平均误差作为选择父代的标准,这可能导致因所有测试案例聚合而出现信息损失。在epsilon-lexicase选择中,种群进入一个选择池,该池通过逐一使用每个测试案例进行迭代缩减,丢弃误差高于精英误差加上该特定测试案例误差的中位数绝对偏差(MAD)的个体。为更好地捕捉个体在案例上的性能差异,我们提出一种新标准,将误差划分为两个部分,以最小化各部分内的总方差。我们的方法被嵌入FEAT符号回归算法,并通过包含122个黑箱合成和真实回归问题的SRBench框架进行评估。实验结果表明,与传统的epsilon-lexicase选择相比,我们的方法在真实数据集上表现出更优性能,同时在合成数据集上表现相当。