Bit-level sparsity in neural network models harbors immense untapped potential. Eliminating redundant calculations of randomly distributed zero-bits significantly boosts computational efficiency. Yet, traditional digital SRAM-PIM architecture, limited by rigid crossbar architecture, struggles to effectively exploit this unstructured sparsity. To address this challenge, we propose Dyadic Block PIM (DB-PIM), a groundbreaking algorithm-architecture co-design framework. First, we propose an algorithm coupled with a distinctive sparsity pattern, termed a dyadic block (DB), that preserves the random distribution of non-zero bits to maintain accuracy while restricting the number of these bits in each weight to improve regularity. Architecturally, we develop a custom PIM macro that includes dyadic block multiplication units (DBMUs) and Canonical Signed Digit (CSD)-based adder trees, specifically tailored for Multiply-Accumulate (MAC) operations. An input pre-processing unit (IPU) further refines performance and efficiency by capitalizing on block-wise input sparsity. Results show that our proposed co-design framework achieves a remarkable speedup of up to 7.69x and energy savings of 83.43%.
翻译:神经网络模型中的比特级稀疏性蕴含着巨大的未开发潜力。消除随机分布零比特的冗余计算可显著提升计算效率。然而,受限于刚性交叉阵列架构的传统数字SRAM-PIM难以有效利用这种非结构化稀疏性。针对这一挑战,我们提出Dyadic Block PIM(DB-PIM)这一开创性算法-架构协同设计框架。首先,我们提出一种结合独特稀疏模式(称为dyadic block,DB)的算法,该模式在限制每个权重中非零比特数量以提升规律性的同时,保留非零比特的随机分布以维持精度。在架构层面,我们开发了定制的PIM宏单元,包含专为乘累加操作设计的dyadic block乘法单元(DBMU)和基于规范有符号数(CSD)的加法树。输入预处理单元(IPU)通过利用块级输入稀疏性进一步优化性能与能效。结果表明,我们提出的协同设计框架最高可实现7.69倍的加速比和83.43%的能耗节省。