Deep Reinforcement Learning for Dynamic Algorithm Selection: A Proof-of-Principle Study on Differential Evolution

Evolutionary algorithms, such as Differential Evolution, excel in solving real-parameter optimization challenges. However, the effectiveness of a single algorithm varies across different problem instances, necessitating considerable efforts in algorithm selection or configuration. This paper aims to address the limitation by leveraging the complementary strengths of a group of algorithms and dynamically scheduling them throughout the optimization progress for specific problems. We propose a deep reinforcement learning-based dynamic algorithm selection framework to accomplish this task. Our approach models the dynamic algorithm selection a Markov Decision Process, training an agent in a policy gradient manner to select the most suitable algorithm according to the features observed during the optimization process. To empower the agent with the necessary information, our framework incorporates a thoughtful design of landscape and algorithmic features. Meanwhile, we employ a sophisticated deep neural network model to infer the optimal action, ensuring informed algorithm selections. Additionally, an algorithm context restoration mechanism is embedded to facilitate smooth switching among different algorithms. These mechanisms together enable our framework to seamlessly select and switch algorithms in a dynamic online fashion. Notably, the proposed framework is simple and generic, offering potential improvements across a broad spectrum of evolutionary algorithms. As a proof-of-principle study, we apply this framework to a group of Differential Evolution algorithms. The experimental results showcase the remarkable effectiveness of the proposed framework, not only enhancing the overall optimization performance but also demonstrating favorable generalization ability across different problem classes.

翻译：进化算法（如差分进化算法）在解决实参数优化问题中表现优异。然而，单一算法在不同问题实例上的有效性存在差异，导致算法选择或配置需要大量人工干预。本文旨在通过利用一组算法的互补优势，针对特定问题在优化进程中动态调度这些算法，从而突破这一限制。我们提出了一种基于深度强化学习的动态算法选择框架来实现该目标。该方法将动态算法选择建模为马尔可夫决策过程，采用策略梯度方式训练智能体，使其根据优化过程中观测的特征选择最合适的算法。为赋予智能体必要信息，该框架创新性地融入了景观特征与算法特征设计，同时采用深度神经网络模型推断最优动作。此外，通过嵌入算法上下文恢复机制，实现了不同算法间的平滑切换。这些机制共同使框架能够以动态在线方式无缝选择并切换算法。值得注意的是，该框架简洁通用，可为广泛进化算法提供性能提升潜力。作为原理验证研究，我们将该框架应用于一组差分进化算法，实验结果表明该框架不仅显著提升了整体优化性能，在不同问题类别间也展现出优异的泛化能力。