Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis

A reinforcement learning (RL) framework is introduced for the efficient synthesis of quantum circuits that generate specified target quantum states from a fixed initial state, addressing a central challenge in both the Noisy Intermediate-Scale Quantum (NISQ) era and future fault-tolerant quantum computing. The approach utilizes tabular Q-learning, based on action sequences, within a discretized quantum state space, to effectively manage the exponential growth of the space dimension.The framework introduces a hybrid reward mechanism, combining a static, domain-informed reward that guides the agent toward the target state with customizable dynamic penalties that discourage inefficient circuit structures such as gate congestion and redundant state revisits. This is a circuit-aware reward, in contrast to the current trend of works on this topic, which are primarily fidelity-based. By leveraging sparse matrix representations and state-space discretization, the method enables practical navigation of high-dimensional environments while minimizing computational overhead. Benchmarking on graph-state preparation tasks for up to seven qubits, we demonstrate that the algorithm consistently discovers minimal-depth circuits with optimized gate counts. Moreover, extending the framework to a universal gate set still yields low depth circuits, highlighting the algorithm robustness and adaptability. The results confirm that this RL-driven approach, with our completely circuit-aware method, efficiently explores the complex quantum state space and synthesizes near-optimal quantum circuits, providing a resource-efficient foundation for quantum circuit optimization.

翻译：本文提出了一种强化学习框架，用于高效合成从固定初始态生成指定目标量子态的量子电路，以应对含噪声中等规模量子时代及未来容错量子计算中的一个核心挑战。该方法在离散化的量子态空间中，利用基于动作序列的表格型Q学习，有效管理空间维度的指数增长。该框架引入了混合奖励机制，将引导智能体朝向目标态的静态领域知识奖励与可定制的动态惩罚相结合，以抑制低效电路结构，如门拥塞和冗余状态重访。这是一种电路感知的奖励机制，与当前该主题研究中主要基于保真度的方法形成对比。通过利用稀疏矩阵表示和状态空间离散化，该方法能够在高维环境中实现实际可行的导航，同时最小化计算开销。在多达七个量子比特的图态制备任务上进行基准测试，我们证明该算法能够一致地发现具有优化门数量的最小深度电路。此外，将该框架扩展到通用门集仍能产生低深度电路，突显了算法的鲁棒性和适应性。结果证实，这种采用我们完全电路感知方法的强化学习驱动方案，能够高效探索复杂的量子态空间并合成接近最优的量子电路，为量子电路优化提供了资源高效的基础。