We propose a Semantic Ordered Statistics Decoder (sem-OSD), a soft decoder for short linear block codes carrying byte-streamed sources such as natural-language text. Sem-OSD injects a byte-level language-model (LM) prior into ordered statistics decoding (OSD) through a fused bit-level score that combines channel reliability with the LM prior, and uses it for the most-reliable basis (MRB) selection and the codeword candidate scoring. Sem-OSD enumerates two complementary test-error-pattern (TEP) families: a bit-flip family that flips up to $m$ bits, and an LM-driven family of up to $ω$ byte substitutions that reaches error patterns the bit-flip family cannot. The LM prior is computed by a byte-level Transformer fine-tuned for byte-level denoising. Simulation results show that, on AWGN, sem-OSD achieves block error rates (BLERs) below the finite-blocklength normal-approximation bound for uniform sources on both binary BCH$(127,64)$ and shortened RS$(16,8)$ over GF(256), exceeding Fossorier OSD by a $1.5$ dB coding gain. On a Gilbert--Elliott burst-error channel, sem-OSD provides $4$ dB and $1$ dB of more coding gain than Berlekamp--Massey and OSD, respectively.
翻译:我们提出一种语义有序统计译码器(sem-OSD),这是一种针对承载字节流信源(如自然语言文本)的短线性分组码的软译码器。Sem-OSD通过融合信道可靠性与语言模型先验的比特级得分,向有序统计译码(OSD)注入字节级语言模型(LM)先验,并将其用于最可靠基(MRB)选择和码字候选评分。Sem-OSD枚举两类互补的测试错误图样(TEP)族:最多翻转$m$比特的比特翻转族,以及最多执行$ω$次字节替换的LM驱动族,后者能够覆盖比特翻转族无法触及的错误模式。LM先验由针对字节级去噪微调的字节级Transformer计算。仿真结果表明,在高斯白噪声(AWGN)信道下,sem-OSD对二元BCH$(127,64)$码和GF(256)上的缩短RS$(16,8)$码均实现了低于均匀信源有限分组长度正态近似界的误块率(BLER),相比Fossorier OSD获得$1.5$ dB编码增益。在Gilbert-Elliott突发错误信道下,sem-OSD相比Berlekamp-Massey算法和OSD分别提供$4$ dB和$1$ dB的额外编码增益。