This paper presents a Quantum Reinforcement Learning (QRL) solution to the dynamic portfolio optimization problem based on Variational Quantum Circuits. The implemented QRL approaches are quantum analogues of the classical neural-network-based Deep Deterministic Policy Gradient and Deep Q-Network algorithms. Through an empirical evaluation on real-world financial data, we show that our quantum agents achieve risk-adjusted performance comparable to, and in some cases exceeding, that of classical Deep RL models with several orders of magnitude more parameters. However, while quantum circuit execution is inherently fast at the hardware level, practical deployment on cloud-based quantum systems introduces substantial latency, making end-to-end runtime currently dominated by infrastructural overhead and limiting practical applicability. Taken together, our results suggest that QRL is theoretically competitive with state-of-the-art classical reinforcement learning and may become practically advantageous as deployment overheads diminish. This positions QRL as a promising paradigm for dynamic decision-making in complex, high-dimensional, and non-stationary environments such as financial markets. The complete codebase is released as open source at: https://github.com/VincentGurgul/qrl-dpo-public
翻译:本文提出了一种基于变分量子电路的量子强化学习(QRL)解决方案,用于解决动态投资组合优化问题。所实现的QRL方法是经典基于神经网络的深度确定性策略梯度与深度Q网络算法的量子类比。通过对真实世界金融数据的实证评估,我们表明,我们的量子智能体实现了与经典深度强化学习模型相当、在某些情况下甚至超越其的风险调整后性能,而经典模型通常包含多几个数量级的参数。然而,尽管量子电路执行在硬件层面本质上是快速的,但在基于云的量子系统上的实际部署引入了显著的延迟,使得端到端运行时间目前主要由基础设施开销主导,从而限制了实际应用性。综上所述,我们的结果表明,QRL在理论上与最先进的经典强化学习具有竞争力,并且随着部署开销的减少,可能在实际中变得更具优势。这使QRL成为在复杂、高维和非平稳环境(如金融市场)中进行动态决策的一个有前景的范式。完整代码库已在以下地址开源发布:https://github.com/VincentGurgul/qrl-dpo-public