Traditional PID controllers have limited adaptability for plasma shape control, and task-specific reinforcement learning (RL) methods suffer from limited generalization and the need for repetitive retraining. To overcome these challenges, this paper proposes a novel framework for developing a versatile, zero-shot control policy from a large-scale offline dataset of historical PID-controlled discharges. Our approach synergistically combines Generative Adversarial Imitation Learning (GAIL) with Hilbert space representation learning to achieve dual objectives: mimicking the stable operational style of the PID data and constructing a geometrically structured latent space for efficient, goal-directed control. The resulting foundation policy can be deployed for diverse trajectory tracking tasks in a zero-shot manner without any task-specific fine-tuning. Evaluations on the HL-3 tokamak simulator demonstrate that the policy excels at precisely and stably tracking reference trajectories for key shape parameters across a range of plasma scenarios. This work presents a viable pathway toward developing highly flexible and data-efficient intelligent control systems for future fusion reactors.
翻译:传统PID控制器在等离子体形状控制方面适应性有限,而针对特定任务的强化学习方法则存在泛化能力不足和需要重复训练的缺陷。为克服这些挑战,本文提出了一种新颖的框架,旨在从大规模历史PID控制放电的离线数据集中开发一种通用的零样本控制策略。我们的方法将生成对抗模仿学习与希尔伯特空间表示学习协同结合,以实现双重目标:模仿PID数据中稳定的操作风格,并构建一个几何结构化的潜空间,以实现高效、目标导向的控制。由此得到的基础策略可以零样本方式部署于多种轨迹跟踪任务,无需任何任务特定的微调。在HL-3托卡马克模拟器上的评估表明,该策略能够在一系列等离子体场景下,精确且稳定地跟踪关键形状参数的参考轨迹。这项工作为开发面向未来聚变反应堆的高度灵活且数据高效智能控制系统提供了一条可行路径。