Cryptographic algorithms such as AES-128 and SHA-256 are fundamental to ensuring data security and integrity. Although these algorithms are computationally efficient, their performance is often constrained by the processor-centric architectures (e.g., CPUs, GPUs), primarily due to the memory bottleneck. This constraint leads to increased latency and higher energy consumption, particularly when handling large volumes of data. To overcome these challenges, Processing-in-Memory (PIM) has emerged as a promising architectural paradigm, allowing computation to occur directly within or near memory units. By minimizing data movement between the processor and memory units, PIM can significantly accelerate cryptographic algorithms while improving energy efficiency. Several pieces of prior work have demonstrated the effectiveness of PIM at fundamentally accelerating cryptographic algorithms. However, none of the prior works have extensively demonstrated the potential of a real-world PIM system. In this paper, we want to investigate the potential and limitations of real-world PIM in accelerating cryptographic algorithms. As part of our methodology, the UPMEM PIM architecture is used to assess the scalability of cryptographic algorithms. When these algorithms operate on a single rank, their performance remains below that of modern CPUs. However, distributing the computation across multiple ranks significantly enhances performance. When all available ranks are utilized, real-world PIM can accelerate cryptographic algorithms more effectively.
翻译:AES-128和SHA-256等密码学算法是保障数据安全与完整性的基础。尽管这些算法在计算上高效,但其性能常受限于以处理器为中心的架构(如CPU、GPU),主要原因是内存瓶颈。这一约束导致处理大量数据时延迟增加、能耗升高。为解决这些挑战,内存内处理(PIM)作为一种有前景的架构范式应运而生,允许计算直接在内或靠近内存单元进行。通过最小化处理器与内存单元间的数据移动,PIM可显著加速密码学算法,同时提升能效。多项先前工作已证明PIM在从根本上加速密码学算法方面的有效性。然而,尚无研究广泛展示真实PIM系统的潜力。本文旨在探究真实世界PIM在加速密码学算法中的潜力与局限。作为研究方法的一部分,我们采用UPMEM PIM架构评估密码学算法的可扩展性。当这些算法在单rank上运行时,其性能仍低于现代CPU。然而,将计算分布到多rank可显著提升性能。当利用所有可用rank时,真实世界PIM能更有效地加速密码学算法。