Deep Reinforcement Learning-Assisted Automated Operator Portfolio for Constrained Multi-objective Optimization

Constrained multi-objective optimization problems (CMOPs) are of great significance in the context of practical applications, ranging from scientific to engineering domains. Most existing constrained multi-objective evolutionary algorithms (CMOEAs) usually employ fixed operators all the time, which exhibit poor versatility in handling various CMOPs. Therefore, some recent studies have focused on adaptively selecting the best operators for the current population states during the search process. The evolutionary algorithms proposed in these studies learn the value of each operator and recommend the operator with the highest value for the current population, resulting in only a single operator being recommended at each generation, which can potentially lead to local optima and inefficient utilization of function evaluations. To address the dilemma in operator adaptation, this paper proposes a reinforcement learning-based automated operator portfolio approach to learn an allocation scheme of operators at each generation. This approach considers the optimization-related and constraint-related features of the current population as states, the overall improvement in population convergence and diversity as rewards, and different operator portfolios as actions. By utilizing deep neural networks to establish a mapping model between the population states and the expected cumulative rewards, the proposed approach determines the optimal operator portfolio during the evolutionary process. By embedding the proposed approach into existing CMOEAs, a deep reinforcement learning-assisted automated operator portfolio based evolutionary algorithm for solving CMOPs, abbreviated as CMOEA-AOP, is developed. Empirical studies on 33 benchmark problems demonstrate that the proposed algorithm significantly enhances the performance of CMOEAs and exhibits more stable performance across different CMOPs.

翻译：约束多目标优化问题在从科学到工程领域的实际应用中具有重要意义。现有的大多数约束多目标进化算法通常长期使用固定算子，在处理不同类型的约束多目标优化问题时通用性较差。因此，近年一些研究关注于在搜索过程中根据当前种群状态自适应选择最优算子。这些研究提出的进化算法通过学习每个算子的价值，并为当前种群推荐价值最高的算子，导致每代仅推荐单一算子，这可能导致陷入局部最优以及函数评估的低效利用。为解决算子自适应中的这一困境，本文提出一种基于强化学习的自动算子组合方法，用于学习每代算子的分配方案。该方法将当前种群的优化相关特征和约束相关特征作为状态，种群收敛性与多样性的整体改善作为奖励，不同的算子组合作为动作。通过利用深度神经网络建立种群状态与期望累积奖励之间的映射模型，所提方法在进化过程中确定最优算子组合。通过将所提方法嵌入现有约束多目标进化算法，开发了一种基于深度强化学习辅助自动算子组合的约束多目标优化进化算法（简称CMOEA-AOP）。在33个基准问题上的实验研究表明，所提算法显著提升了约束多目标进化算法的性能，并在不同约束多目标优化问题上表现出更稳定的性能。