Transformer channel decoders, such as the Error Correction Code Transformer (ECCT), have shown strong empirical performance in channel decoding, yet their generalization behavior remains theoretically unclear. This paper studies the generalization performance of ECCT from a learning-theoretic perspective. By establishing a connection between multiplicative noise estimation errors and bit-error-rate (BER), we derive an upper bound on the generalization gap via bit-wise Rademacher complexity. The resulting bound characterizes the dependence on code length, model parameters, and training set size, and applies to both single-layer and multi-layer ECCTs. We further show that parity-check-based masked attention induces sparsity that reduces the covering number, leading to a tighter generalization bound. To the best of our knowledge, this work provides the first theoretical generalization guarantees for this class of decoders.
翻译:Transformer信道解码器(如纠错码Transformer(ECCT))在信道解码中已展现出强大的经验性能,但其泛化行为在理论上仍不明确。本文从学习理论的角度研究了ECCT的泛化性能。通过建立乘性噪声估计误差与误比特率(BER)之间的联系,我们借助逐比特Rademacher复杂度推导了泛化间隙的上界。所得上界刻画了其对码长、模型参数和训练集规模的依赖关系,并适用于单层和多层ECCT。我们进一步证明,基于奇偶校验的掩码注意力机制通过引入稀疏性降低了覆盖数,从而得到更紧致的泛化界。据我们所知,本研究首次为该类解码器提供了理论上的泛化保证。