Parametrised quantum circuits offer expressive and data-efficient representations for machine learning. Due to quantum states residing in a high-dimensional Hilbert space, parametrised quantum circuits have a natural interpretation in terms of kernel methods. The representation of quantum circuits in terms of quantum kernels has been studied widely in quantum supervised learning, but has been overlooked in the context of quantum reinforcement learning. This paper proposes parametric and non-parametric policy gradient and actor-critic algorithms with quantum kernel policies in quantum environments. This approach, implemented with both numerical and analytical quantum policy gradient techniques, allows exploiting the many advantages of kernel methods, including available analytic forms for the gradient of the policy and tunable expressiveness. The proposed approach is suitable for vector-valued action spaces and each of the formulations demonstrates a quadratic reduction in query complexity compared to their classical counterparts. Two actor-critic algorithms, one based on stochastic policy gradient and one based on deterministic policy gradient (comparable to the popular DDPG algorithm), demonstrate additional query complexity reductions compared to quantum policy gradient algorithms under favourable conditions.
翻译:参数化量子电路为机器学习提供了表达能力强且数据高效的表示方法。由于量子态存在于高维希尔伯特空间中,参数化量子电路在核方法方面具有自然的解释。量子电路通过量子核的表示在量子监督学习领域已得到广泛研究,但在量子强化学习背景下却被忽视。本文提出了在量子环境中使用量子核策略的参数化和非参数化策略梯度与行动者-批评者算法。该方法通过数值和解析量子策略梯度技术实现,能够充分利用核方法的诸多优势,包括策略梯度的可用解析形式与可调表达能力。所提方法适用于向量值动作空间,且每种方案相较于经典对应方法均展现出查询复杂度的二次降低。两种行动者-批评者算法——一种基于随机策略梯度,另一种基于确定性策略梯度(与流行的DDPG算法相当)——在有利条件下相比量子策略梯度算法展示了额外的查询复杂度降低。