Processing-in-memory (PIM) promises to alleviate the data movement bottleneck in modern computing systems. However, current real-world PIM systems have the inherent disadvantage that their hardware is more constrained than in conventional processors (CPU, GPU), due to the difficulty and cost of building processing elements near or inside the memory. As a result, general-purpose PIM architectures support fairly limited instruction sets and struggle to execute complex operations such as transcendental functions and other hard-to-calculate operations (e.g., square root). These operations are particularly important for some modern workloads, e.g., activation functions in machine learning applications. In order to provide support for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, we present \emph{TransPimLib}, a library that provides CORDIC-based and LUT-based methods for trigonometric functions, hyperbolic functions, exponentiation, logarithm, square root, etc. We develop an implementation of TransPimLib for the UPMEM PIM architecture and perform a thorough evaluation of TransPimLib's methods in terms of performance and accuracy, using microbenchmarks and three full workloads (Blackscholes, Sigmoid, Softmax). We open-source all our code and datasets at~\url{https://github.com/CMU-SAFARI/transpimlib}.
翻译:存内处理(PIM)有望缓解现代计算系统中的数据移动瓶颈。然而,当前实际PIM系统存在固有缺陷:由于在存储器内部或附近构建处理元件的难度与成本较高,其硬件资源相较于传统处理器(CPU、GPU)更为受限。因此,通用PIM架构仅支持相当有限的指令集,且难以执行复杂运算(如超越函数及其他难以计算的函数,例如平方根)。这类运算对某些现代负载(如机器学习应用中的激活函数)尤为重要。为在通用PIM系统中提供超越函数(及其他难以计算的函数)的支持,我们提出TransPimLib库,该库提供基于CORDIC和LUT的方法,用于实现三角函数、双曲函数、指数运算、对数运算、平方根等函数。我们针对UPMEM PIM架构开发了TransPimLib的实现,并通过微基准测试与三个完整负载(Blackscholes、Sigmoid、Softmax)对TransPimLib方法的性能与精度进行了全面评估。我们已将所有代码与数据集开源至~\url{https://github.com/CMU-SAFARI/transpimlib}。