In this work we propose RELDEC, a novel approach for sequential decoding of moderate length low-density parity-check (LDPC) codes. The main idea behind RELDEC is that an optimized decoding policy is subsequently obtained via reinforcement learning based on a Markov decision process (MDP). In contrast to our previous work, where an agent learns to schedule only a single check node (CN) within a group (cluster) of CNs per iteration, in this work we train the agent to schedule all CNs in a cluster, and all clusters in every iteration. That is, in each learning step of RELDEC an agent learns to schedule CN clusters sequentially depending on a reward associated with the outcome of scheduling a particular cluster. We also modify the state space representation of the MDP, enabling RELDEC to be suitable for larger block length LDPC codes than those studied in our previous work. Furthermore, to address decoding under varying channel conditions, we propose agile meta-RELDEC (AM-RELDEC) that employs meta-reinforcement learning. The proposed RELDEC scheme significantly outperforms standard flooding and random sequential decoding for a variety of LDPC codes, including codes designed for 5G new radio.
翻译:本文提出RELDEC,一种针对中等长度低密度奇偶校验(LDPC)码的序贯译码新方法。RELDEC的核心思想在于:通过基于马尔可夫决策过程(MDP)的强化学习,逐步获得优化后的译码策略。与以往工作不同——之前每个迭代周期内智能体仅学习调度单个校验节点(CN)组(簇)中的单一节点——本研究训练智能体调度簇内所有校验节点及各迭代周期内的全部簇。即在RELDEC的每个学习步骤中,智能体根据调度特定簇后产生的奖励值,依序学习调度校验节点簇。我们还改进了MDP的状态空间表示,使RELDEC能够适配比以往研究更长的块长度LDPC码。此外,为应对可变信道条件下的译码需求,我们提出采用元强化学习的敏捷元RELDEC(AM-RELDEC)。实验表明,所提出的RELDEC方案在多种LDPC码(包括为5G新空口设计的码型)上显著优于标准泛洪译码与随机序贯译码。