On-sky demonstration of reinforcement learning for adaptive optics control

Jalo Nousiainen,Vincent Chambouleyron,Benoit Neichel,Sylvain Cetre,Jean-Francois Sauvage,Angelie Alagao,Markus Kasper,Jonathan Dray,Romain Fetick,Byron Engler

from arxiv, 11 pages, 12 figures accepted by A&A

Reinforcement learning (RL)-based algorithms have recently emerged as a promising approach for adaptive optics (AO) control. In simulations and laboratory experiments, they have demonstrated robustness to real-world effects such as photon and detector noise, misregistration, vibrations, and rapid variations in seeing conditions. However, their performance has not yet been validated on sky. We report the first on-sky demonstration of a reinforcement learning controller for adaptive optics, named Policy Optimization for AO (PO4AO). We further analyze its on-sky behavior and identify directions for improving the algorithm and its implementation.PO4AO was implemented and deployed on the Papyrus adaptive optics system installed at the Coudé focus of the 1.52 m telescope (T152) at the OHP. A Python-based implementation was interfaced with the existing real-time controller (DAO RTC) via shared-memory buffers. The performance of PO4AO was compared to that of a standard integrator controller over several nights, covering a range of flux levels and atmospheric conditions. PO4AO consistently outperformed the standard integrator in all tested configurations. The controller successfully learned and compensated for vibration patterns and demonstrated strong robustness to measurement noise. Once tuned for Papyrus, PO4AO operated in a turnkey fashion, using a single set of hyperparameters across varying observing conditions and science targets. These performance gains were achieved despite a non-optimized Python implementation introducing approximately $750\,μ\text{s}$ of additional latency, along with control jitter and occasional frame drops. When properly implemented and optimized, PO4AO constitutes a robust and high-performance turnkey controller for single-conjugate adaptive optics systems, paving the way for broader adoption of reinforcement learning strategies in on-sky AO operations.

翻译：基于强化学习（RL）的算法近期成为自适应光学（AO）控制领域中一种有前景的方法。在仿真和实验室实验中，这些算法展现出对光子噪声、探测器噪声、配准误差、振动以及视宁度条件快速变化等实际影响因素的鲁棒性。然而，其性能尚未在天文观测中得到验证。我们首次报告了用于自适应光学的强化学习控制器——自适应光学策略优化（PO4AO）的天文示范结果。我们进一步分析了其天文运行行为，并指出了改进算法及其实现的方向。PO4AO已在法国上普罗旺斯天文台（OHP）1.52米望远镜（T152）库德焦点的Papyrus自适应光学系统上部署并运行。基于Python的实现通过共享内存缓冲区与现有实时控制器（DAO RTC）进行接口交互。在多个夜晚覆盖不同光通量水平与大气条件下，我们将PO4AO的性能与标准积分控制器进行了比较。在所有测试配置中，PO4AO均持续优于标准积分控制器。该控制器成功学习并补偿了振动模式，并展现出对测量噪声的强大鲁棒性。一旦完成对Papyrus系统的调校，PO4AO即可实现即插即用式运行，在变化的观测条件和科学目标下使用同一组超参数。尽管非优化的Python实现引入了约750微秒的额外延迟、控制抖动及偶发帧丢失，这些性能增益仍然得以实现。经过适当实现和优化后，PO4AO将成为单共轭自适应光学系统稳健且高性能的即插即用控制器，为强化学习策略在天文AO运行中的广泛采用铺平道路。