Genetic programming systems often use large training sets to evaluate the quality of candidate solutions for selection, which is often computationally expensive. Down-sampling training sets has long been used to decrease the computational cost of evaluation in a wide range of application domains. More specifically, recent studies have shown that both random and informed down-sampling can substantially improve problem-solving success for GP systems that use the lexicase parent selection algorithm. We test whether these down-sampling techniques can also improve problem-solving success in the context of three other commonly used selection methods, fitness-proportionate, tournament, implicit fitness sharing plus tournament selection, across six program synthesis GP problems. We verified that down-sampling can significantly improve the problem-solving success for all three of these other selection schemes, demonstrating its general efficacy. We discern that the selection pressure imposed by the selection scheme does not interact with the down-sampling method. However, we find that informed down-sampling can improve problem solving success significantly over random down-sampling when the selection scheme has a mechanism for diversity maintenance like lexicase or implicit fitness sharing. Overall, our results suggest that down-sampling should be considered more often when solving test-based problems, regardless of the selection scheme in use.
翻译:遗传编程系统通常使用大规模训练集来评估候选解的质量以进行选择,这往往带来高昂的计算开销。降采样训练集作为一种降低评估计算成本的方法,长期以来已在众多应用领域中得到应用。具体而言,近期研究表明,无论是随机降采样还是基于信息的降采样,都能显著提升采用词典序父代选择算法的遗传编程系统的问题求解成功率。本研究在六个程序合成遗传编程问题上,测试了这些降采样技术对另外三种常用选择方法(适应度比例选择、锦标赛选择、以及结合隐式适应度共享的锦标赛选择)问题求解成功率的影响。实验证实,降采样能显著提升所有这三种选择机制的问题求解成功率,证明了其普遍有效性。我们发现选择机制施加的选择压力与降采样方法之间不存在交互作用。然而,当选择机制具备多样性保持机制(如词典序选择或隐式适应度共享)时,基于信息的降采样相比随机降采样能显著提高问题求解成功率。总体而言,我们的研究结果表明,在求解基于测试的问题时,无论采用何种选择机制,都应更频繁地考虑使用降采样技术。