Number Theoretic Transform (NTT) is an essential mathematical tool for computing polynomial multiplication in promising lattice-based cryptography. However, costly division operations and complex data dependencies make efficient and flexible hardware design to be challenging, especially on resource-constrained edge devices. Existing approaches either focus on only limited parameter settings or impose substantial hardware overhead. In this paper, we introduce a hardware-algorithm methodology to efficiently accelerate NTT in various settings using in-cache computing. By leveraging an optimized bit-parallel modular multiplication and introducing costless shift operations, our proposed solution provides up to 29x higher throughput-per-area and 2.8-100x better throughput-per-area-per-joule compared to the state-of-the-art.
翻译:数论变换(NTT)是计算格密码中多项式乘法的重要数学工具。然而,昂贵的除法运算和复杂的数据依赖关系使得高效灵活的硬件设计面临挑战,尤其在资源受限的边缘设备上。现有方法要么仅支持有限的参数设置,要么带来显著硬件开销。本文提出一种硬件-算法协同方法,利用缓存内计算技术在不同参数设置下高效加速NTT。通过优化比特并行模乘运算并引入零开销移位操作,与现有最优方案相比,本方案实现了最高29倍的吞吐量-面积比提升,以及2.8-100倍的吞吐量-面积-能耗比提升。