Processing-in-memory (PIM) promises to alleviate the data movement bottleneck in modern computing systems. However, current real-world PIM systems have the inherent disadvantage that their hardware is more constrained than in conventional processors (CPU, GPU), due to the difficulty and cost of building processing elements near or inside the memory. As a result, general-purpose PIM architectures support fairly limited instruction sets and struggle to execute complex operations such as transcendental functions and other hard-to-calculate operations (e.g., square root). These operations are particularly important for some modern workloads, e.g., activation functions in machine learning applications. In order to provide support for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, we present \emph{TransPimLib}, a library that provides CORDIC-based and LUT-based methods for trigonometric functions, hyperbolic functions, exponentiation, logarithm, square root, etc. We develop an implementation of TransPimLib for the UPMEM PIM architecture and perform a thorough evaluation of TransPimLib's methods in terms of performance and accuracy, using microbenchmarks and three full workloads (Blackscholes, Sigmoid, Softmax). We open-source all our code and datasets at~\url{https://github.com/CMU-SAFARI/transpimlib}.
翻译:处理-内存(PIM)有望缓解现代计算系统中的数据移动瓶颈。然而,当前真实世界的PIM系统存在固有缺陷,由于在存储器内部或附近构建处理单元的难度和成本,其硬件资源相较于传统处理器(CPU、GPU)更为受限。因此,通用型PIM架构仅支持相当有限的指令集,难以执行复杂操作(如超越函数及其他难以计算的运算,例如平方根)。这些操作对某些现代工作负载尤为重要,例如机器学习应用中的激活函数。为在通用型PIM系统中支持超越函数(及其他难以计算的函数),我们提出TransPimLib——一个基于CORDIC方法和LUT(查找表)方法的库,涵盖三角函数、双曲函数、指数运算、对数运算、平方根等函数。我们针对UPMEM PIM架构实现了TransPimLib,并通过微基准测试及三种完整工作负载(Blackscholes、Sigmoid、Softmax),对TransPimLib在性能与精度方面进行了全面评估。所有代码与数据集均已开源,详见:\url{https://github.com/CMU-SAFARI/transpimlib}。