Beyond Speedups: Hardware-Aware Evaluation of Evolutionary Algorithms on GPUs

Evolutionary algorithms (EAs) are increasingly executed on graphics processing units (GPUs) to exploit population-level parallelism. This shift changes the resource model under which EAs are designed and evaluated. However, many GPU-based EA studies still focus mainly on implementation-level speedup after porting CPU-oriented algorithms to GPUs, providing limited insight into how algorithmic mechanisms, function-evaluation (FE) budgets, population scales, and hardware utilization jointly affect optimization behavior. In response, this paper goes beyond speedup measurement and studies the scaling behavior of EAs on GPUs from a hardware-aware evaluation perspective. We evaluate 16 representative EAs on 30 benchmark problems across CPU and GPU platforms, covering single-objective optimization, multi-objective optimization, numerical benchmarks, and neuroevolution tasks. The study leads to four findings. First, GPU acceleration is highly heterogeneous across algorithms because different evolutionary mechanisms expose different degrees of batched computation, memory regularity, and synchronization. Second, FE-budgeted evaluation remains useful for measuring sample efficiency, but it provides only a limited observation window under GPU execution; time-budgeted evaluation is therefore necessary for assessing practical time-to-solution and long-horizon search behavior. Third, GPU effectiveness depends on scaling regimes induced by problem dimension and population size, where parallelism may be underutilized, effective, or saturated. Fourth, GPU execution makes very large populations practically affordable, and several evolutionary mechanisms can convert this increased population scale into improved optimization performance. These results indicate that GPU parallelism should not be treated only as a post hoc acceleration tool, but as part of the evaluation and design assumptions of scalable EAs.

翻译：进化算法（EAs）正越来越多地在图形处理单元（GPU）上执行，以利用种群级并行性。这一转变改变了EAs设计与评估的资源模型。然而，许多基于GPU的EA研究仍主要关注将面向CPU的算法移植到GPU后的实现层面加速，对算法机制、函数评估（FE）预算、种群规模以及硬件利用率如何共同影响优化行为提供的洞察有限。为此，本文超越加速度量，从硬件感知评估视角研究EAs在GPU上的缩放行为。我们在CPU和GPU平台上，针对30个基准问题评估了16个代表性的EAs，涵盖单目标优化、多目标优化、数值基准以及神经进化任务。研究得出四个发现。首先，GPU加速在不同算法间高度异质，因为不同的进化机制展现出不同程度的批处理计算、内存规律性和同步需求。其次，基于FE预算的评估在衡量样本效率方面仍有用，但在GPU执行下仅提供有限的观察窗口；因此需要基于时间预算的评估来衡量实际求解时间及长时搜索行为。第三，GPU的有效性取决于由问题维度和种群规模诱导的缩放区间，其中并行性可能未被充分利用、有效或饱和。第四，GPU执行使得极大规模种群在实践中可行，而若干进化机制可将这种增加的种群规模转化为更优的优化性能。这些结果表明，GPU并行性不应仅被视为事后加速工具，而应作为可扩展EAs评估与设计假设的一部分。