Augmenting the balanced residue number system moduli-set $\{m_1=2^n,m_2=2^n-1,m_3=2^n+1\}$, with the co-prime modulo $m_4=2^{2n}+1$, increases the dynamic range (DR) by around 70%. The Mersenne form of product $m_2 m_3 m_4=2^{4n}-1$, in the moduli-set $\{m_1,m_2,m_3,m_4\}$, leads to a very efficient reverse convertor, based on the New Chinese remainder theorem. However, the double bit-width of the m_4 residue channel is counter-productive and jeopardizes the speed balance in $\{m_1,m_2,m_3\}$. Therefore, we decompose $m_4$ to two complex-number n-bit moduli $2^n\pm\sqrt{-1}$, which preserves the DR and the co-primality across the augmented moduli set. The required forward modulo-$(2^{2n}+1)$ to moduli-$(2^n\pm\sqrt{-1}) $conversion, and the reverse are immediate and cost-free. The proposed unified moduli-$(2^n\pm\sqrt{-1})$ adder and multiplier, are tested and synthesized using Spartan 7S100 FPGA. The 6-bit look-up tables (LUT), therein, promote the LUT realizations of adders and multipliers, for $n=5$, where the DR equals $2^{25}-2^5$. However, the undertaken experiments show that to cover all the 32-bit numbers, the power-of-two channel $m_1$ can be as wide as 12 bits with no harm to the speed balance across the five moduli. The results also show that the moduli-$(2^5\pm\sqrt{-1})$ add and multiply operations are advantageous vs. moduli-$(2^5\pm1)$ in speed, cost, and energy measures and collectively better than those of modulo-$(2^{10}+1)$.
翻译:为平衡余数系统模集$\{m_1=2^n,m_2=2^n-1,m_3=2^n+1\}$添加互质模$m_4=2^{2n}+1$,可将动态范围(DR)提升约70%。基于新中国剩余定理,模集$\{m_1,m_2,m_3,m_4\}$中乘积$m_2 m_3 m_4=2^{4n}-1$的梅森数形式可实现高效反向转换器。然而,$m_4$余数通道的双倍位宽会降低效率并破坏$\{m_1,m_2,m_3\}$的速度平衡。为此,我们将$m_4$分解为两个复数n位模$2^n\pm\sqrt{-1}$,在保持DR和互质性的同时扩展模集。所需的模-$(2^{2n}+1)$到模-$(2^n\pm\sqrt{-1})$正向转换及反向转换均为即时无成本操作。本文提出的统一模-$(2^n\pm\sqrt{-1})$加法器和乘法器已通过Spartan 7S100 FPGA测试与综合。其中6位查找表(LUT)促进了$n=5$时加法器与乘法器的LUT实现,此时DR为$2^{25}-2^5$。但实验表明,为覆盖所有32位数,幂次通道$m_1$可扩展至12位而不影响五个模之间的速度平衡。结果还显示,模-$(2^5\pm\sqrt{-1})$的加法与乘法运算在速度、成本和能效方面优于模-$(2^5\pm1)$,且整体性能优于模-$(2^{10}+1)$。