Processing-in-memory (PIM) promises to alleviate the data movement bottleneck in modern computing systems. However, current real-world PIM systems have the inherent disadvantage that their hardware is more constrained than in conventional processors (CPU, GPU), due to the difficulty and cost of building processing elements near or inside the memory. As a result, general-purpose PIM architectures support fairly limited instruction sets and struggle to execute complex operations such as transcendental functions and other hard-to-calculate operations (e.g., square root). These operations are particularly important for some modern workloads, e.g., activation functions in machine learning applications. In order to provide support for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, we present \emph{TransPimLib}, a library that provides CORDIC-based and LUT-based methods for trigonometric functions, hyperbolic functions, exponentiation, logarithm, square root, etc. We develop an implementation of TransPimLib for the UPMEM PIM architecture and perform a thorough evaluation of TransPimLib's methods in terms of performance and accuracy, using microbenchmarks and three full workloads (Blackscholes, Sigmoid, Softmax). We open-source all our code and datasets at~\url{https://github.com/CMU-SAFARI/transpimlib}.
翻译:处理中存储(PIM)有望缓解现代计算系统中的数据移动瓶颈。然而,当前真实世界的PIM系统存在固有的劣势,即在存储器内部或附近构建处理元件的难度和成本较高,导致其硬件比传统处理器(CPU、GPU)更为受限。因此,通用型PIM架构仅支持相当有限的指令集,且难以执行超越函数及其他复杂运算(如平方根)等复杂操作。这些运算对某些现代工作负载至关重要,例如机器学习应用中的激活函数。为在通用型PIM系统中提供超越函数(及其他复杂运算)的支持,我们提出TransPimLib——该库提供了基于CORDIC和LUT的三角函数、双曲函数、指数、对数、平方根等计算方法。我们针对UPMEM PIM架构开发了TransPimLib的实现,并通过微基准测试及三个完整工作负载(Black-Scholes、Sigmoid、Softmax)对其方法的性能与精度进行了全面评估。我们已在以下网址开源所有代码与数据集:https://github.com/CMU-SAFARI/transpimlib。