Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis

A reinforcement learning (RL) framework is introduced for the efficient synthesis of quantum circuits that generate specified target quantum states from a fixed initial state, addressing a central challenge in both the Noisy Intermediate-Scale Quantum (NISQ) era and future fault-tolerant quantum computing. The approach utilizes tabular Q-learning, based on action sequences, within a discretized quantum state space, to effectively manage the exponential growth of the space dimension. The framework introduces a hybrid reward mechanism, combining a static, domain-informed reward that guides the agent toward the target state with customizable dynamic penalties that discourage inefficient circuit structures such as gate congestion and redundant state revisits. This is a circuit-aware reward, in contrast to the current trend of works on this topic, which are primarily fidelity-based. By leveraging sparse matrix representations and state-space discretization, the method enables practical navigation of high-dimensional environments while minimizing computational overhead. Benchmarking on graph-state preparation tasks for up to seven qubits, we demonstrate that the algorithm consistently discovers minimal-depth circuits with optimized gate counts. Moreover, extending the framework to a universal gate set still yields low depth circuits, highlighting the algorithm robustness and adaptability. The results confirm that this RL-driven approach, with our completely circuit-aware method, efficiently explores the complex quantum state space and synthesizes near-optimal quantum circuits, providing a resource-efficient foundation for quantum circuit optimization.

翻译：本文提出了一种强化学习（RL）框架，用于高效合成从固定初始态生成指定目标量子态的量子电路，以应对含噪声中等规模量子（NISQ）时代及未来容错量子计算中的一个核心挑战。该方法在离散化的量子态空间中，利用基于动作序列的表格Q学习，有效管理空间维度的指数级增长。该框架引入了一种混合奖励机制，将引导智能体朝向目标态的静态领域知识奖励，与可定制的动态惩罚相结合，以抑制低效的电路结构，如门拥堵和冗余状态重访。这是一种电路感知的奖励机制，与当前该主题研究中主要基于保真度的方法形成对比。通过利用稀疏矩阵表示和状态空间离散化，该方法能够在最小化计算开销的同时，在实际中导航高维环境。在多达七个量子比特的图态制备任务上进行基准测试，我们证明该算法能够持续发现具有优化门数量的最小深度电路。此外，将该框架扩展到通用门集仍能产生低深度电路，突显了算法的鲁棒性和适应性。结果证实，这种采用我们完全电路感知方法的RL驱动方法，能够高效探索复杂的量子态空间并合成接近最优的量子电路，为量子电路优化提供了资源高效的基础。