We utilize hybrid quantum deep reinforcement learning to learn navigation tasks for a simple, wheeled robot in simulated environments of increasing complexity. For this, we train parameterized quantum circuits (PQCs) with two different encoding strategies in a hybrid quantum-classical setup as well as a classical neural network baseline with the double deep Q network (DDQN) reinforcement learning algorithm. Quantum deep reinforcement learning (QDRL) has previously been studied in several relatively simple benchmark environments, mainly from the OpenAI gym suite. However, scaling behavior and applicability of QDRL to more demanding tasks closer to real-world problems e. g., from the robotics domain, have not been studied previously. Here, we show that quantum circuits in hybrid quantum-classic reinforcement learning setups are capable of learning optimal policies in multiple robotic navigation scenarios with notably fewer trainable parameters compared to a classical baseline. Across a large number of experimental configurations, we find that the employed quantum circuits outperform the classical neural network baselines when equating for the number of trainable parameters. Yet, the classical neural network consistently showed better results concerning training times and stability, with at least one order of magnitude of trainable parameters more than the best-performing quantum circuits. However, validating the robustness of the learning methods in a large and dynamic environment, we find that the classical baseline produces more stable and better performing policies overall.
翻译:我们采用混合量子深度强化学习方法,在复杂度递增的模拟环境中为简单轮式机器人学习导航任务。为此,我们在混合量子-经典架构中使用两种不同的编码策略训练参数化量子电路,同时以采用双深度Q网络强化学习算法的经典神经网络作为基线对照。量子深度强化学习此前已在多个相对简单的基准环境中(主要来自OpenAI gym套件)得到研究。然而,量子深度强化学习在更接近现实世界问题(例如机器人领域)的复杂任务中的扩展行为和适用性尚未被探讨。本研究表明,在混合量子-经典强化学习架构中,量子电路能够在多种机器人导航场景中学习最优策略,且所需可训练参数数量显著少于经典基线模型。通过大量实验配置发现,在可训练参数数量相同的情况下,所采用的量子电路性能优于经典神经网络基线。然而,经典神经网络在训练时间和稳定性方面始终表现更优,其可训练参数数量至少比性能最佳的量子电路多一个数量级。但在验证学习方法于大型动态环境中的鲁棒性时,我们发现经典基线模型总体上能产生更稳定且性能更优的策略。