Due to the large state space of the two-qubit system, and the adoption of ladder reward function in the existing quantum state preparation methods, the convergence speed is slow and it is difficult to prepare the desired target quantum state with high fidelity under limited conditions. To solve the above problems, a difference-driven reinforcement learning (RL) algorithm for quantum state preparation of two-qubit system is proposed by improving the reward function and action selection strategy. Firstly, a model is constructed for the problem of preparing quantum states of a two-qubit system, with restrictions on the type of quantum gates and the time for quantum state evolution. In the preparation process, a weighted differential dynamic reward function is designed to assist the algorithm quickly obtain the maximum expected cumulative reward. Then, an adaptive e-greedy action selection strategy is adopted to achieve a balance between exploration and utilization to a certain extent, thereby improving the fidelity of the final quantum state. The simulation results show that the proposed algorithm can prepare quantum state with high fidelity under limited conditions. Compared with other algorithms, it has different degrees of improvement in convergence speed and fidelity of the final quantum state.
翻译:针对双量子比特系统状态空间庞大,且现有量子态制备方法采用阶梯式奖励函数导致收敛速度慢、在有限条件下难以高保真度制备预期目标量子态的问题,通过改进奖励函数与动作选择策略,提出一种面向双量子比特系统量子态制备的差异驱动强化学习算法。首先,针对双量子比特系统的量子态制备问题构建模型,并对量子门类型及量子态演化时间加以约束;制备过程中设计加权差分动态奖励函数,辅助算法快速获取最大期望累积奖励。随后,采用自适应ε-贪心动作选择策略,在一定程度上兼顾探索与利用的平衡,从而提升最终量子态的保真度。仿真结果表明,所提算法能在有限条件下高保真度制备量子态;与其他算法相比,在收敛速度与最终量子态保真度方面均有不同程度的提升。