Polynomial multiplication is a fundamental kernel in Fully Homomorphic Encryption (FHE) and post-quantum cryptography (PQC) and is commonly accelerated through Number Theoretic Transforms (NTTs). To avoid the cost of designing dedicated cryptographic accelerators, recent efforts have mapped NTT computations onto existing systolic matrix engines, enabling the reuse of AI hardware for cryptographic workloads. In this work, we take the opposite approach. We observe that the wavefront dataflow of systolic arrays naturally aligns with the accumulation pattern of polynomial multiplication and leverage this correspondence to design MPX, a dual-mode systolic array that supports both matrix multiplication and direct polynomial multiplication within the same hardware fabric. Experimental results show that extending a conventional systolic array with this dual-mode capability requires only 20% additional area and introduces negligible power overhead during matrix-multiplication execution. In polynomial-multiplication mode, MPX achieves more than 1.2x lower latency compared to NTT-based polynomial multiplication on systolic matrix engines.
翻译:多项式乘法是全同态加密(FHE)和后量子密码学(PQC)中的基础核心计算,通常通过数论变换(NTT)加速。为规避专用密码加速器的设计成本,近期研究将NTT计算映射至现有脉动矩阵引擎,实现了AI硬件在密码学工作负载中的复用。本文采取相反思路:观察到脉动阵列的波前数据流与多项式乘法的累积模式天然契合,利用这一对应关系设计了MPX——一种在同一硬件架构内同时支持矩阵乘法与直接多项式乘法的双模脉动阵列。实验表明,为传统脉动阵列扩展该双模能力仅需20%的额外面积,且在矩阵乘法执行期间引入的功耗开销可忽略不计。在多项式乘法模式下,与基于NTT的脉动矩阵引擎多项式实现相比,MPX的延迟降低了1.2倍以上。