Processing-in-Memory (PIM) architectures are evolving to minimize data movement by leveraging the same physical devices for both memory and logic functionalities. While analog PIM harnesses crossbar arrays for efficient approximate matrix-vector multiplication, digital PIM architectures facilitate massively-parallel bitwise operations for more general workloads. Recent works have extended digital PIM towards the full-precision acceleration of convolutional neural networks (CNNs), yet a comprehensive comparison with GPUs remains a gap in the literature that may illuminate the limitations of digital PIM. This paper aims to fill this void by conducting a thorough examination of CNN acceleration through an updated quantitative comparison with GPUs. Our approach begins with a theoretical investigation into various PIM architectures, shedding light on their performance characteristics and constraints. Subsequently, through a series of benchmarks spanning memory-bound vectored arithmetic to CNN acceleration, we provide insights into digital PIM performance that may guide the acceleration of applications in the future.
翻译:暂无翻译