Evolutionary Algorithms (EAs) are often challenging to apply in real-world settings since evolutionary computations involve a large number of evaluations of a typically expensive fitness function. For example, an evaluation could involve training a new machine learning model. An approximation (also known as meta-model or a surrogate) of the true function can be used in such applications to alleviate the computation cost. In this paper, we propose a two-stage surrogate-assisted evolutionary approach to address the computational issues arising from using Genetic Algorithm (GA) for feature selection in a wrapper setting for large datasets. We define 'Approximation Usefulness' to capture the necessary conditions to ensure correctness of the EA computations when an approximation is used. Based on this definition, we propose a procedure to construct a lightweight qualitative meta-model by the active selection of data instances. We then use a meta-model to carry out the feature selection task. We apply this procedure to the GA-based algorithm CHC (Cross generational elitist selection, Heterogeneous recombination and Cataclysmic mutation) to create a Qualitative approXimations variant, CHCQX. We show that CHCQX converges faster to feature subset solutions of significantly higher accuracy (as compared to CHC), particularly for large datasets with over 100K instances. We also demonstrate the applicability of the thinking behind our approach more broadly to Swarm Intelligence (SI), another branch of the Evolutionary Computation (EC) paradigm with results of PSOQX, a qualitative approximation adaptation of the Particle Swarm Optimization (PSO) method. A GitHub repository with the complete implementation is available.
翻译:演化算法在现实应用中常面临挑战,因为进化计算需要大量评估通常代价高昂的适应度函数。例如,一次评估可能涉及训练新的机器学习模型。在此类应用中,可采用真实函数的近似(亦称元模型或代理模型)来降低计算成本。本文提出一种两阶段代理辅助的进化方法,以解决在大数据集封装模式下使用遗传算法进行特征选择时产生的计算问题。我们定义了“近似有用性”来捕捉使用近似时确保进化算法计算正确性的必要条件。基于这一定义,我们提出了一种通过主动选择数据实例来构建轻量级定性元模型的流程,并利用该元模型执行特征选择任务。我们将此流程应用于基于遗传算法的CHC算法(跨代精英选择、异构重组与灾变突变),创建了定性近似变体CHCQX。实验表明,与CHC相比,CHCQX在超过10万实例的大数据集上能更快收敛至显著更高准确率的特征子集解。此外,我们还展示了该方法的思想可广泛适用于进化计算分支中的群体智能领域——通过粒子群优化方法的定性近似变体PSOQX的实验结果进行验证。包含完整实现的GitHub代码仓库已开放获取。