Reinforcement learning (RL) has shown promising results for real-time control systems, including the domain of plasma magnetic control. However, there are still significant drawbacks compared to traditional feedback control approaches for magnetic confinement. In this work, we address key drawbacks of the RL method; achieving higher control accuracy for desired plasma properties, reducing the steady-state error, and decreasing the required time to learn new tasks. We build on top of \cite{degrave2022magnetic}, and present algorithmic improvements to the agent architecture and training procedure. We present simulation results that show up to 65\% improvement in shape accuracy, achieve substantial reduction in the long-term bias of the plasma current, and additionally reduce the training time required to learn new tasks by a factor of 3 or more. We present new experiments using the upgraded RL-based controllers on the TCV tokamak, which validate the simulation results achieved, and point the way towards routinely achieving accurate discharges using the RL approach.
翻译:强化学习在实时控制系统(包括等离子体磁场控制领域)中展现出良好前景。然而,相比传统磁约束反馈控制方法,强化学习仍存在显著不足。本研究致力于解决强化学习方法的关键缺陷:提升等离子体目标特性的控制精度、降低稳态误差、缩短新任务的学习时间。我们在《degrave2022magnetic》研究基础上,提出了智能体架构与训练流程的算法改进。仿真结果表明:形状控制精度最高提升65%,等离子体电流长期偏差显著降低,新任务训练时间缩短至原来的1/3或更少。基于TCV托卡马克装置的升级版强化学习控制器实验,验证了仿真结果的可靠性,为采用强化学习方法实现常规高精度放电指明了方向。