Shor's algorithm proved that asymmetric cryptographic protocols based on the integer factorization and discrete logarithm problems are no longer safe in a world with large-scale quantum computers. As a result, Post-Quantum Cryptography (PQC) has been developed over the last few years, seeking cryptographic primitives resistant to quantum attacks. One of the main hard problems underlying PQC schemes is the Learning with Errors (LWE) problem, which is significantly more computationally intensive than its classical predecessors. In this work, we present a Key Encapsulation Mechanism (KEM) based on plain LWE and develop a GPU-oriented implementation using OpenACC. We evaluate the performance of our accelerated application in terms of both time-to-solution and energy-to-solution, considering bare-metal and containerized executions across multiple NVIDIA GPU models and generations. Our implementation achieves significant acceleration across all tested GPU platforms. In particular, on the NVIDIA Grace Hopper Superchip, it attains up to a $208\times$ speedup over a multithreaded CPU baseline and enables the execution of problem sizes that are impractical on CPU architectures due to memory and synchronization constraints. Energy consumption analysis also shows $\approx 2\times$ better efficiency when using the Superchip compared to systems equipped with x86-based CPUs and NVIDIA H100 GPUs. These results highlight the effectiveness of GPU acceleration for computationally demanding LWE-based cryptographic workloads.
翻译:Shor算法证明了在拥有大规模量子计算机的世界中,基于整数分解和离散对数问题的非对称密码协议将不再安全。因此,在过去几年中,后量子密码学(Post-Quantum Cryptography, PQC)得到了发展,旨在寻找能够抵抗量子攻击的密码学原语。PQC方案所依赖的主要困难问题之一是错误学习问题(Learning with Errors, LWE),其计算强度远高于经典的前身。本文提出了一种基于原始LWE的密钥封装机制(Key Encapsulation Mechanism, KEM),并利用OpenACC开发了面向GPU的实现。我们从求解时间和能耗两个角度评估了加速应用的性能,考虑了在多种NVIDIA GPU模型和代际上进行裸机和容器化执行的情况。我们的实现在所有测试的GPU平台上均实现了显著的加速。特别是,在NVIDIA Grace Hopper Superchip上,相较于多线程CPU基线,实现了高达208倍的加速比,并能够执行在CPU架构上因内存和同步约束而不可行的问题规模。能耗分析还表明,与配备x86 CPU和NVIDIA H100 GPU的系统相比,使用Superchip的效率提高了约2倍。这些结果突显了GPU加速对计算密集型的基于LWE的密码工作负载的有效性。