Particle accelerator operation requires simultaneous optimization of multiple objectives. Multi-Objective Optimization (MOO) is particularly challenging due to trade-offs between the objectives. Evolutionary algorithms, such as genetic algorithm (GA), have been leveraged for many optimization problems, however, they do not apply to complex control problems by design. This paper demonstrates the power of differentiability for solving MOO problems using a Deep Differentiable Reinforcement Learning (DDRL) algorithm in particle accelerators. We compare DDRL algorithm with Model Free Reinforcement Learning (MFRL), GA and Bayesian Optimization (BO) for simultaneous optimization of heat load and trip rates in the Continuous Electron Beam Accelerator Facility (CEBAF). The underlying problem enforces strict constraints on both individual states and actions as well as cumulative (global) constraint for energy requirements of the beam. A physics-based surrogate model based on real data is developed. This surrogate model is differentiable and allows back-propagation of gradients. The results are evaluated in the form of a Pareto-front for two objectives. We show that the DDRL outperforms MFRL, BO, and GA on high dimensional problems.
翻译:粒子加速器运行需要同时优化多个目标。由于目标之间的权衡关系,多目标优化(MOO)尤其具有挑战性。进化算法(如遗传算法GA)已被用于许多优化问题,但其设计原理不适用于复杂控制问题。本文通过深度可微分强化学习(DDRL)算法,展示了可微分性在解决粒子加速器MOO问题中的优势。我们在连续电子束加速器装置(CEBAF)中,针对热负荷与跳闸率的同步优化问题,比较了DDRL算法与无模型强化学习(MFRL)、遗传算法(GA)及贝叶斯优化(BO)的性能。该基础问题对单个状态与动作以及束流能量需求的累积(全局)约束均施加了严格限制。基于真实数据开发了物理驱动的代理模型,该模型具有可微分特性,支持梯度反向传播。结果以双目标帕累托前沿的形式进行评估。研究表明,在高维问题上DDRL的表现优于MFRL、BO和GA。