We present a full implementation and simulation of a novel quantum reinforcement learning method. Our work is a detailed and formal proof of concept for how quantum algorithms can be used to solve reinforcement learning problems and shows that, given access to error-free, efficient quantum realizations of the agent and environment, quantum methods can yield provable improvements over classical Monte-Carlo based methods in terms of sample complexity. Our approach shows in detail how to combine amplitude estimation and Grover search into a policy evaluation and improvement scheme. We first develop quantum policy evaluation (QPE) which is quadratically more efficient compared to an analogous classical Monte Carlo estimation and is based on a quantum mechanical realization of a finite Markov decision process (MDP). Building on QPE, we derive a quantum policy iteration that repeatedly improves an initial policy using Grover search until the optimum is reached. Finally, we present an implementation of our algorithm for a two-armed bandit MDP which we then simulate.
翻译:我们提出了一种新颖的量子强化学习方法的完整实现与仿真。本研究详细且正式地证明了量子算法可用于解决强化学习问题的概念,表明在获得无误差、高效的智能体与环境量子实现的前提下,量子方法在样本复杂度上可相较于经典蒙特卡洛方法带来可证明的提升。我们的方法详细展示了如何将幅度估计与Grover搜索结合到策略评估与改进方案中。我们首先发展了量子策略评估(QPE),该方案基于有限马尔可夫决策过程(MDP)的量子力学实现,其效率相较于经典蒙特卡洛估计呈二次方提升。基于QPE,我们推导出量子策略迭代算法,该算法通过Grover搜索反复改进初始策略直至达到最优。最后,我们针对双臂匪徒MDP实现了该算法并进行了仿真。