Processing-in-memory (PIM) promises to alleviate the data movement bottleneck in modern computing systems. However, current real-world PIM systems have the inherent disadvantage that their hardware is more constrained than in conventional processors (CPU, GPU), due to the difficulty and cost of building processing elements near or inside the memory. As a result, general-purpose PIM architectures support fairly limited instruction sets and struggle to execute complex operations such as transcendental functions and other hard-to-calculate operations (e.g., square root). These operations are particularly important for some modern workloads, e.g., activation functions in machine learning applications. In order to provide support for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, we present \emph{TransPimLib}, a library that provides CORDIC-based and LUT-based methods for trigonometric functions, hyperbolic functions, exponentiation, logarithm, square root, etc. We develop an implementation of TransPimLib for the UPMEM PIM architecture and perform a thorough evaluation of TransPimLib's methods in terms of performance and accuracy, using microbenchmarks and three full workloads (Blackscholes, Sigmoid, Softmax). We open-source all our code and datasets at~\url{https://github.com/CMU-SAFARI/transpimlib}.
翻译:处理中内存(PIM)技术有望缓解现代计算系统中的数据移动瓶颈。然而,当前实际PIM系统存在固有缺陷:由于在内存内部或邻近位置构建处理元件的难度与成本较高,其硬件资源相比传统处理器(CPU、GPU)更为受限。因此,通用PIM架构仅支持相当有限的指令集,难以执行超越函数及其他复杂计算(如平方根等)操作。这类运算对某些现代工作负载(例如机器学习应用中的激活函数)尤为重要。为在通用PIM系统中提供对超越函数(及其他复杂函数)的支持,我们提出\emph{TransPimLib}库,该库基于CORDIC和LUT方法实现了三角函数、双曲函数、指数运算、对数运算、平方根等计算。我们针对UPMEM PIM架构开发了TransPimLib的实现,并通过微基准测试及三项完整负载(Blackscholes、Sigmoid、Softmax)对其方法的性能与精度进行了全面评估。所有代码与数据集已在~\url{https://github.com/CMU-SAFARI/transpimlib} 开源。