A Family of Low-Complexity Binary Codes with Constant Hamming Weights

In this paper, we focus on the design of binary constant weight codes that admit low-complexity encoding and decoding algorithms, and that have a size $M=2^k$. For every integer $\ell \geq 3$, we construct a $(n=2^\ell, M=2^{k_{\ell}}, d=2)$ constant weight code ${\cal C}[\ell]$ of weight $\ell$ by encoding information in the gaps between successive $1$'s. The code is associated with an integer sequence of length $\ell$ with a constraint defined as {\em anchor-decodability} that ensures low complexity for encoding and decoding. The complexity of the encoding is linear in the input size $k$, and that of the decoding is poly-logarithmic in the input size $n$, discounting the linear time spent on parsing the input. Both the algorithms do not require expensive computation of binomial coefficients, unlike the case in many existing schemes. Among codes generated by all anchor-decodable sequences, we show that ${\cal C}[\ell]$ has the maximum size with $k_{\ell} \geq \ell^2-\ell\log_2\ell + \log_2\ell - 0.279\ell - 0.721$. As $k$ is upper bounded by $\ell^2-\ell\log_2\ell +O(\ell)$ information-theoretically, the code ${\cal C}[\ell]$ is optimal in its size with respect to two higher order terms of $\ell$. In particular, $k_\ell$ meets the upper bound for $\ell=3$ and one-bit away for $\ell=4$. On the other hand, we show that ${\cal C}[\ell]$ is not unique in attaining $k_{\ell}$ by constructing an alternate code ${\cal \hat{C}}[\ell]$ again parameterized by an integer $\ell \geq 3$ with a different low-complexity decoder, yet having the same size $2^{k_{\ell}}$ when $3 \leq \ell \leq 7$. Finally, we also derive new codes by modifying ${\cal C}[\ell]$ that offer a wider range on blocklength and weight while retaining low complexity for encoding and decoding. For certain selected values of parameters, these modified codes too have an optimal $k$.

翻译：本文聚焦于设计一类具有低复杂度编码与解码算法且规模为$M=2^k$的二进制定重码。对于每个整数$\ell \geq 3$，我们通过将信息编码在连续“1”之间的间隔中，构造了一个重量为$\ell$、参数为$(n=2^\ell, M=2^{k_{\ell}}, d=2)$的定重码${\cal C}[\ell]$。该码与一个长度为$\ell$的整数序列相关联，该序列满足一种称为“锚点可解码性”的约束条件，从而确保编码与解码的低复杂度。编码复杂度相对于输入长度$k$是线性的，解码复杂度相对于输入长度$n$是多对数的（不计入解析输入所需的线性时间）。与许多现有方案不同，这两种算法均无需计算开销较大的二项式系数。在所有由锚点可解码序列生成的码中，我们证明${\cal C}[\ell]$具有最大规模，其$k_{\ell} \geq \ell^2-\ell\log_2\ell + \log_2\ell - 0.279\ell - 0.721$。由于信息论上$k$的上界为$\ell^2-\ell\log_2\ell +O(\ell)$，码${\cal C}[\ell]$在规模上关于$\ell$的两个高阶项是最优的。具体而言，$k_\ell$在$\ell=3$时达到上界，在$\ell=4$时仅相差一位。另一方面，我们通过构造另一种参数化码${\cal \hat{C}}[\ell]$（同样参数化为整数$\ell \geq 3$，采用不同的低复杂度解码器）证明${\cal C}[\ell]$并非唯一能达到$k_{\ell}$的码，且在$3 \leq \ell \leq 7$时具有相同的规模$2^{k_{\ell}}$。最后，我们还通过对${\cal C}[\ell]$进行修改，导出了一系列新码，这些码在保持编码与解码低复杂度的同时，提供了更广的码长和重量范围。对于某些特定参数值，这些修改后的码同样具有最优的$k$。