Number Theoretic Transform (NTT) is an essential mathematical tool for computing polynomial multiplication in promising lattice-based cryptography. However, costly division operations and complex data dependencies make efficient and flexible hardware design to be challenging, especially on resource-constrained edge devices. Existing approaches either focus on only limited parameter settings or impose substantial hardware overhead. In this paper, we introduce a hardware-algorithm methodology to efficiently accelerate NTT in various settings using in-cache computing. By leveraging an optimized bit-parallel modular multiplication and introducing costless shift operations, our proposed solution provides up to 29x higher throughput-per-area and 2.8-100x better throughput-per-area-per-joule compared to the state-of-the-art.
翻译:数论变换(NTT)是计算格密码学中多项式乘法的重要数学工具。然而,昂贵的除法运算和复杂的数据依赖关系使得高效灵活的硬件设计充满挑战,尤其是在资源受限的边缘设备上。现有方法要么仅针对有限参数设置,要么会带来过高的硬件开销。本文提出一种硬件-算法协同方法,利用缓存内计算技术在不同配置下高效加速NTT。通过优化位并行模乘并引入零开销移位操作,与现有最优方法相比,本方案可实现至高29倍的面积效率提升和2.8-100倍的单位面积能耗效率提升。