Owing to its high parallelism, belief propagation (BP) decoding is highly amenable to high-throughput implementations and thus represents a promising solution for meeting the ultra-high peak data rate of future communication systems. However, for polar codes, the error-correcting performance of BP decoding is far inferior to that of the widely used CRC-aided successive cancellation list (SCL) decoding algorithm. To close the performance gap to SCL, BP list (BPL) decoding expands the exploration of candidate codewords through multiple permuted factor graphs (PFGs). From an implementation perspective, designing a unified and flexible hardware architecture for BPL decoding that supports various PFGs and code configurations presents a big challenge. In this paper, we propose the first hardware implementation of a BPL decoder for polar codes and overcome the implementation challenge by applying a hardware-friendly algorithm that generates flexible permutations on-the-fly. First, we derive the graph selection gain and provide a sequential generation (SG) algorithm to obtain a near-optimal PFG set. We further prove that any permutation can be decomposed into a combination of multiple fixed routings, and we design a low-complexity permutation network to satisfy the decoding schedule. Our BPL decoder not only has a low decoding latency by executing the decoding and permutation generation in parallel, but also supports an arbitrary list size without any area overhead. Experimental results show that, for length-1024 polar codes with a code rate of one-half, our BPL decoder with 32 PFGs has a similar error-correcting performance to SCL with a list size of 4 and achieves a throughput of 25.63 Gbps and an area efficiency of 29.46 Gbps/mm$^{2}$ at SNR=4.0dB, which is 1.82$\times$ and 4.33$\times$ faster than the state-of-the-art BP flip and SCL decoders,~respectively
翻译:凭借其高并行性,置信传播(BP)译码非常适合高吞吐量实现,因此代表了满足未来通信系统超高峰值数据速率的一种有前景的解决方案。然而,对于极化码,BP译码的纠错性能远逊于广泛使用的CRC辅助串行抵消列表(SCL)译码算法。为缩小与SCL的性能差距,BP列表(BPL)译码通过多个置换因子图(PFG)扩展了对候选码字的探索。从实现角度来看,设计一种统一且灵活的BPL译码硬件架构,以支持各种PFG和码配置,是一项重大挑战。本文提出了首个极化码BPL译码器的硬件实现,并通过应用一种硬件友好的算法在运行时生成灵活置换,克服了实现难题。首先,我们推导了图选择增益,并提出了一种顺序生成(SG)算法来获得近最优的PFG集合。我们进一步证明了任何置换都可以分解为多个固定布线的组合,并设计了低复杂度的置换网络以满足译码调度需求。我们的BPL译码器不仅通过并行执行译码和置换生成实现了低译码延迟,还支持任意列表大小且无需额外面积开销。实验结果表明,对于码率为1/2、长度为1024的极化码,采用32个PFG的BPL译码器在纠错性能上与列表大小为4的SCL相当,并在信噪比4.0dB下实现了25.63 Gbps的吞吐量和29.46 Gbps/mm$^{2}$的面积效率,比现有最优的BP翻转和SCL译码器分别快1.82倍和4.33倍。