Lattice-based cryptographic algorithms built on ring learning with error theory are gaining importance due to their potential for providing post-quantum security. However, these algorithms involve complex polynomial operations, such as polynomial modular multiplication (PMM), which is the most time-consuming part of these algorithms. Accelerating PMM is crucial to make lattice-based cryptographic algorithms widely adopted by more applications. This work introduces a novel high-throughput and compact PMM accelerator, X-Poly, based on the crossbar (XB)-type compute-in-memory (CIM). We identify the most appropriate PMM algorithm for XB-CIM. We then propose a novel bit-mapping technique to reduce the area and energy of the XB-CIM fabric, and conduct processing engine (PE)-level optimization to increase memory utilization and support different problem sizes with a fixed number of XB arrays. X-Poly design achieves 3.1X10^6 PMM operations/s throughput and offers 200X latency improvement compared to the CPU-based implementation. It also achieves 3.9X throughput per area improvement compared with the state-of-the-art CIM accelerators.
翻译:基于容错学习理论的格基密码算法因具有提供后量子安全性的潜力而日益重要。然而,这类算法涉及复杂的多项式运算,其中多项式模乘(PMM)是最耗时的环节。加速PMM对于推动格基密码算法在更多应用中的广泛采用至关重要。本文提出一种基于交叉存算一体阵列(XB-CIM)的新型高通量紧凑型PMM加速器X-Poly。我们确定了最适合XB-CIM的PMM算法,进而提出一种新颖的位映射技术以降低XB-CIM架构的芯片面积和能耗,并通过处理引擎(PE)级优化提升内存利用率,支持在固定数量XB阵列下处理不同规模的问题。X-Poly设计实现了每秒3.1×10^6次PMM运算的通量,与基于CPU的实现相比,延迟改善达200倍。同时,与当前最先进的CIM加速器相比,其单位面积的吞吐率提升3.9倍。