Constraint handling plays a key role in solving realistic complex optimization problems. Though intensively discussed in the last few decades, existing constraint handling techniques predominantly rely on human experts' designs, which more or less fall short in utility towards general cases. Motivated by recent progress in Meta-Black-Box Optimization where automated algorithm design can be learned to boost optimization performance, in this paper, we propose learning effective, adaptive and generalizable constraint handling policy through reinforcement learning. Specifically, a tailored Markov Decision Process is first formulated, where given optimization dynamics features, a deep Q-network-based policy controls the constraint relaxation level along the underlying optimization process. Such adaptive constraint handling provides flexible tradeoff between objective-oriented exploitation and feasible-region-oriented exploration, and hence leads to promising optimization performance. We train our approach on CEC 2017 Constrained Optimization benchmark with limited evaluation budget condition (expensive cases) and compare the trained constraint handling policy to strong baselines such as recent winners in CEC/GECCO competitions. Extensive experimental results show that our approach performs competitively or even surpasses the compared baselines under either Leave-one-out cross-validation or ordinary train-test split validation. Further analysis and ablation studies reveal key insights in our designs.
翻译:约束处理在解决现实复杂优化问题中起着关键作用。尽管过去几十年已得到广泛讨论,现有约束处理技术主要依赖人类专家设计,在通用性方面或多或少存在不足。受元黑盒优化领域最新进展的启发——其中自动化算法设计可通过学习提升优化性能——本文提出通过强化学习来学习有效、自适应且可泛化的约束处理策略。具体而言,首先构建了一个定制的马尔可夫决策过程:在给定优化动态特征的情况下,基于深度Q网络的策略控制底层优化过程中的约束松弛水平。这种自适应约束处理机制在目标导向的利用与可行域导向的探索之间提供了灵活权衡,从而获得优异的优化性能。我们在CEC 2017约束优化基准上,以有限评估预算条件(昂贵情形)训练所提方法,并将训练后的约束处理策略与CEC/GECCO竞赛优胜算法等强基线进行比较。大量实验结果表明,无论是留一法交叉验证还是常规训练-测试集划分验证,我们的方法均表现出竞争力甚至超越对比基线。进一步的解析与消融研究揭示了设计中的关键洞见。