Quantum reinforcement learning (QRL) has emerged as a framework to solve sequential decision-making tasks, showcasing empirical quantum advantages. A notable development is through quantum recurrent neural networks (QRNNs) for memory-intensive tasks such as partially observable environments. However, QRL models incorporating QRNN encounter challenges such as inefficient training of QRL with QRNN, given that the computation of gradients in QRNN is both computationally expensive and time-consuming. This work presents a novel approach to address this challenge by constructing QRL agents utilizing QRNN-based reservoirs, specifically employing quantum long short-term memory (QLSTM). QLSTM parameters are randomly initialized and fixed without training. The model is trained using the asynchronous advantage actor-aritic (A3C) algorithm. Through numerical simulations, we validate the efficacy of our QLSTM-Reservoir RL framework. Its performance is assessed on standard benchmarks, demonstrating comparable results to a fully trained QLSTM RL model with identical architecture and training settings.
翻译:量子强化学习已成为解决序贯决策任务的框架,并展现出实证量子优势。其中,通过量子递归神经网络处理部分可观测环境等记忆密集型任务取得了显著进展。然而,由于QRNN中梯度计算既昂贵又耗时,融合QRNN的量子强化学习模型面临训练效率低下的挑战。本文提出了一种创新方法,通过构建基于QRNN储层的量子强化学习智能体来解决此问题,具体采用量子长短期记忆。QLSTM参数被随机初始化并固定,无需训练。模型采用异步优势动作评价算法进行训练。通过数值模拟,我们验证了QLSTM储层强化学习框架的有效性。在标准基准测试中,其性能评估结果表明,与具有相同架构和训练设置的完全训练QLSTM强化学习模型相比,取得了可比较的结果。