Reinforcement Learning (RL) consists of designing agents that make intelligent decisions without human supervision. When used alongside function approximators such as Neural Networks (NNs), RL is capable of solving extremely complex problems. Deep Q-Learning, a RL algorithm that uses Deep NNs, achieved super-human performance in some specific tasks. Nonetheless, it is also possible to use Variational Quantum Circuits (VQCs) as function approximators in RL algorithms. This work empirically studies the performance and trainability of such VQC-based Deep Q-Learning models in classic control benchmark environments. More specifically, we research how data re-uploading affects both these metrics. We show that the magnitude and the variance of the gradients of these models remain substantial throughout training due to the moving targets of Deep Q-Learning. Moreover, we empirically show that increasing the number of qubits does not lead to an exponential vanishing behavior of the magnitude and variance of the gradients for a PQC approximating a 2-design, unlike what was expected due to the Barren Plateau Phenomenon. This hints at the possibility of VQCs being specially adequate for being used as function approximators in such a context.
翻译:强化学习旨在设计无需人工监督即可做出智能决策的智能体。当与神经网络等函数逼近器结合使用时,强化学习能够解决极其复杂的问题。深度Q学习作为一种使用深度神经网络的强化学习算法,在特定任务中实现了超越人类的表现。然而,在强化学习算法中使用变分量子电路作为函数逼近器同样具有可行性。本研究通过实证方法,在经典控制基准环境中探究此类基于VQC的深度Q学习模型的性能与可训练性。具体而言,我们重点研究了数据重传技术对这两项指标的影响机制。研究结果表明:由于深度Q学习中目标函数的动态特性,此类模型的梯度幅值与方差在整个训练过程中始终保持显著水平。此外,实验数据显示对于近似2-design的参数量子电路,增加量子比特数量并未导致梯度幅值与方差出现指数级衰减现象——这与基于贫瘠高原现象的预期相悖。这一发现暗示了VQC在此类应用场景中作为函数逼近器可能具有特殊优势。