Number Theoretic Transform (NTT) is an essential mathematical tool for computing polynomial multiplication in promising lattice-based cryptography. However, costly division operations and complex data dependencies make efficient and flexible hardware design to be challenging, especially on resource-constrained edge devices. Existing approaches either focus on only limited parameter settings or impose substantial hardware overhead. In this paper, we introduce a hardware-algorithm methodology to efficiently accelerate NTT in various settings using in-cache computing. By leveraging an optimized bit-parallel modular multiplication and introducing costless shift operations, our proposed solution provides up to 29x higher throughput-per-area and 2.8-100x better throughput-per-area-per-joule compared to the state-of-the-art.
翻译:数论变换(NTT)是计算格密码学中多项式乘法的重要数学工具。然而,高开销的除法运算与复杂的数据依赖关系使得实现高效灵活的硬件设计颇具挑战,尤其是在资源受限的边缘设备上。现有方法要么局限于有限参数设置,要么导致可观的硬件开销。本文提出一种硬件-算法协同方法,利用缓存内计算技术在不同参数设置下高效加速NTT。通过采用优化的位并行模乘运算并引入零开销移位操作,所提方案相较现有最优方法实现了最高29倍的吞吐率-面积比提升,以及2.8-100倍的吞吐率-面积-能耗比提升。