We propose a model-based reinforcement learning (RL) approach for noisy time-dependent gate optimization with improved sample complexity over model-free RL. Sample complexity is the number of controller interactions with the physical system. Leveraging an inductive bias, inspired by recent advances in neural ordinary differential equations (ODEs), we use an auto-differentiable ODE parametrised by a learnable Hamiltonian ansatz to represent the model approximating the environment whose time-dependent part, including the control, is fully known. Control alongside Hamiltonian learning of continuous time-independent parameters is addressed through interactions with the system. We demonstrate an order of magnitude advantage in the sample complexity of our method over standard model-free RL in preparing some standard unitary gates with closed and open system dynamics, in realistic numerical experiments incorporating single shot measurements, arbitrary Hilbert space truncations and uncertainty in Hamiltonian parameters. Also, the learned Hamiltonian can be leveraged by existing control methods like GRAPE for further gradient-based optimization with the controllers found by RL as initializations. Our algorithm that we apply on nitrogen vacancy (NV) centers and transmons in this paper is well suited for controlling partially characterised one and two qubit systems.
翻译:我们提出了一种基于模型的强化学习方法,用于含噪时变门优化,相比无模型强化学习,该方法具有更优的样本复杂度。样本复杂度定义为控制器与物理系统交互的次数。受近期神经常微分方程(ODE)领域进展的启发,我们利用一种归纳偏置,采用由可学习哈密顿量假设参数化的可自动微分ODE来表征近似环境的模型。该模型中包含控制的时变部分完全已知,而连续时间无关参数的控制与哈密顿量学习则通过与系统的交互实现。通过包含单次测量、任意希尔伯特空间截断及哈密顿量参数不确定性在内的实际数值实验,我们证明了该方法在制备闭系统和开系统动力学下的若干标准酉门时,其样本复杂度相比标准无模型强化学习具有数量级优势。此外,学习得到的哈密顿量可被现有控制方法(如GRAPE)利用,通过将强化学习获得的控制器作为初始化进行进一步的梯度优化。本文应用于氮空位(NV)中心和transmon量子比特的算法,特别适用于部分表征的单量子比特和双量子比特系统控制。