A novel method, the Pareto Envelope Augmented with Reinforcement Learning (PEARL), has been developed to address the challenges posed by multi-objective problems, particularly in the field of engineering where the evaluation of candidate solutions can be time-consuming. PEARL distinguishes itself from traditional policy-based multi-objective Reinforcement Learning methods by learning a single policy, eliminating the need for multiple neural networks to independently solve simpler sub-problems. Several versions inspired from deep learning and evolutionary techniques have been crafted, catering to both unconstrained and constrained problem domains. Curriculum Learning is harnessed to effectively manage constraints in these versions. PEARL's performance is first evaluated on classical multi-objective benchmarks. Additionally, it is tested on two practical PWR core Loading Pattern optimization problems to showcase its real-world applicability. The first problem involves optimizing the Cycle length and the rod-integrated peaking factor as the primary objectives, while the second problem incorporates the mean average enrichment as an additional objective. Furthermore, PEARL addresses three types of constraints related to boron concentration, peak pin burnup, and peak pin power. The results are systematically compared against conventional approaches. Notably, PEARL, specifically the PEARL-NdS variant, efficiently uncovers a Pareto front without necessitating additional efforts from the algorithm designer, as opposed to a single optimization with scaled objectives. It also outperforms the classical approach across multiple performance metrics, including the Hyper-volume.
翻译:提出了一种新型方法——帕累托前沿增强强化学习(PEARL),以应对多目标问题带来的挑战,尤其在工程领域中候选方案评估耗时的情况下。PEARL与传统基于策略的多目标强化学习方法不同,通过学习单一策略,消除了使用多个神经网络独立解决简单子问题的需求。受深度学习与进化技术启发,我们设计了多个版本,分别适用于无约束和约束问题域。在这些版本中,课程学习被用于有效管理约束条件。PEARL的性能首先在经典多目标基准问题上进行评估,随后在两个实际压水堆堆芯装载模式优化问题中测试其现实适用性。第一个问题以优化循环长度和棒积分峰值因子为主要目标,第二个问题则将平均富集度作为附加目标引入。此外,PEARL还处理了与硼浓度、峰值燃料棒燃耗及峰值燃料棒功率相关的三类约束。结果与常规方法进行了系统比较。值得注意的是,PEARL(特别是PEARL-NdS变体)能高效揭示帕累托前沿,而无需算法设计者额外投入——相比对缩放目标进行单次优化的方法更具优势。它在包括超体积在内的多项性能指标上均优于经典方法。