FrodoKEM is a lattice-based post-quantum key encapsulation mechanism (KEM). It has been considered for standardization by the International Organization for Standardization (ISO) due to its robust security profile. However, its hardware implementation exhibits a weakness of high latency and heavy resource burden, hindering its practical application. Moreover, diverse usage scenarios call for comprehensive functionality. To address these challenges, this paper presents a high-performance and efficient crypto-processor for FrodoKEM. A multiple-instruction overlapped execution scheme is introduced to enable efficient multi-module scheduling and minimize operational latency. Furthermore, a high-speed, reconfigurable parallel multiplier array is integrated to handle intensive matrix computations under diverse computation patterns, significantly enhancing hardware efficiency. In addition, a compact memory scheduling strategy shortens the lifespan of intermediate matrices, thereby reducing overall storage requirements. The proposed design provides full support for all FrodoKEM security levels and protocol phases. It consumes 13467 LUTs, 6042 FFs, and 14 BRAMs on an Artix-7 FPGA and achieves the fastest reported execution time. Compared with state-of-the-art hardware implementations, our design improves the area-time product (ATP) by 1.75-2.00 times.
翻译:FrodoKEM是一种基于格的后量子密钥封装机制(KEM)。由于其鲁棒的安全性,已被国际标准化组织(ISO)考虑纳入标准化。然而,其硬件实现存在高延迟和重资源负担的弱点,阻碍了其实际应用。此外,多样化的使用场景要求全面的功能支持。为应对这些挑战,本文提出了一种面向FrodoKEM的高性能高效密码处理器。引入了一种多指令重叠执行方案,以实现高效的多模块调度并最小化操作延迟。此外,集成了一种高速、可重构的并行乘法器阵列,以处理多种计算模式下的密集矩阵运算,显著提升了硬件效率。另外,一种紧凑的内存调度策略缩短了中间矩阵的生存周期,从而降低了总体存储需求。所提出的设计为所有FrodoKEM安全等级和协议阶段提供了完整支持。在Artix-7 FPGA上,其消耗13467个LUT、6042个FF和14个BRAM,并实现了目前报道最快的执行时间。与最先进的硬件实现相比,我们的设计将面积-时间积(ATP)提高了1.75-2.00倍。