Dynamic Optimization Problems (DOPs) are challenging to address due to their complex nature, i.e., dynamic environment variation. Evolutionary Computation methods are generally advantaged in solving DOPs since they resemble dynamic biological evolution. However, existing evolutionary dynamic optimization methods rely heavily on human-crafted adaptive strategy to detect environment variation in DOPs, and then adapt the searching strategy accordingly. These hand-crafted strategies may perform ineffectively at out-of-box scenarios. In this paper, we propose a reinforcement learning-assisted approach to enable automated variation detection and self-adaption in evolutionary algorithms. This is achieved by borrowing the bi-level learning-to-optimize idea from recent Meta-Black-Box Optimization works. We use a deep Q-network as optimization dynamics detector and searching strategy adapter: It is fed as input with current-step optimization state and then dictates desired control parameters to underlying evolutionary algorithms for next-step optimization. The learning objective is to maximize the expected performance gain across a problem distribution. Once trained, our approach could generalize toward unseen DOPs with automated environment variation detection and self-adaption. To facilitate comprehensive validation, we further construct an easy-to-difficult DOPs testbed with diverse synthetic instances. Extensive benchmark results demonstrate flexible searching behavior and superior performance of our approach in solving DOPs, compared to state-of-the-art baselines.
翻译:动态优化问题(DOPs)因其动态环境变化的复杂性而极具挑战性。进化计算方法因其模拟动态生物进化的特性,通常在求解DOPs时具有优势。然而,现有的进化动态优化方法严重依赖人工设计的自适应策略来检测DOPs中的环境变化,并据此调整搜索策略。这些手工设计的策略在未知场景中可能表现不佳。本文提出一种强化学习辅助的方法,以实现进化算法中自动化的变化检测与自适应。该方法借鉴了近期元黑盒优化工作中双层学习优化思想,采用深度Q网络作为优化动态检测器与搜索策略适配器:它以当前步的优化状态作为输入,进而为底层进化算法指定下一步优化所需的控制参数。学习目标是在问题分布上最大化期望性能增益。一旦训练完成,我们的方法能够泛化至未见过的DOPs,实现自动化的环境变化检测与自适应。为进行全面验证,我们进一步构建了一个包含多样化合成实例的由易至难DOPs测试集。大量基准实验结果表明,相较于最先进的基线方法,我们的方法在求解DOPs时展现出灵活的搜索行为与卓越的性能。