Belief-propagation (BP) decoding for quantum low-density parity-check (QLDPC) codes is appealing due to its low complexity, yet it often exhibits convergence issues due to quantum degeneracy and short cycles that exist in the Tanner graph. To overcome this challenge, this paper proposes a reinforcement-learning (RL) approach that learns (offline) how to decode QLDPC codes based on sequential decoding trajectories. The decoding is formulated as a Markov decision process with a local, syndrome-driven state representation of the underlying RL agent. To enable fast inference, critical for practical implementation, we incrementally update our RL-based QLDPC decoder using second-order neighborhoods that avoid global rescans. Simulation results on representative QLDPC codes demonstrate the superiority of the proposed RL-based QLDPC decoders in terms of performance and convergence speed when compared to flooding and random sequential schedules, while achieving performance competitive with state-of-the-art BP-based decoders at comparable complexity.
翻译:针对量子低密度奇偶校验(QLDPC)码的置信传播(BP)解码因其低复杂度而备受关注,但由于量子简并性以及Tanner图中存在的短循环,该方法常面临收敛性问题。为克服这一挑战,本文提出一种强化学习(RL)方法,通过(离线)学习序列解码轨迹来掌握QLDPC码的解码机制。解码过程被建模为马尔可夫决策过程,其中底层RL智能体采用局部、校验子驱动的状态表示。为实现对实际应用至关重要的快速推理,我们采用基于二阶邻域的增量更新策略,避免全局重复扫描。在典型QLDPC码上的仿真结果表明:与泛洪式及随机序列调度相比,所提出的基于RL的QLDPC解码器在性能和收敛速度方面均表现出优越性,同时在相近复杂度下达到与最先进基于BP的解码器相竞争的性能水平。