Computing-in-Memory (CIM) accelerators are a promising solution for accelerating Machine Learning (ML) workloads, as they perform Matrix-Vector Multiplications (MVMs) on crossbar arrays directly in memory. Although the bit widths of the crossbar inputs and cells are very limited, most CIM compilers do not support quantization below 8 bit. As a result, a single MVM requires many compute cycles, and weights cannot be efficiently stored in a single crossbar cell. To address this problem, we propose a mixed-precision training and compilation framework for CIM architectures. The biggest challenge is the massive search space, that makes it difficult to find good quantization parameters. This is why we introduce a reinforcement learning-based strategy to find suitable quantization configurations that balance latency and accuracy. In the best case, our approach achieves up to a 2.48x speedup over existing state-of-the-art solutions, with an accuracy loss of only 0.086 %.
翻译:内存计算(CIM)加速器是加速机器学习(ML)工作负载的一种有前景的解决方案,因为它们直接在存储器的交叉阵列上执行矩阵向量乘法(MVM)。尽管交叉阵列输入和单元的位宽非常有限,但大多数CIM编译器不支持低于8位的量化。因此,单个MVM需要许多计算周期,并且权重无法高效地存储在单个交叉阵列单元中。为了解决这个问题,我们提出了一种用于CIM架构的混合精度训练与编译框架。最大的挑战在于巨大的搜索空间,这使得难以找到良好的量化参数。为此,我们引入了一种基于强化学习的策略,以寻找能够平衡延迟与精度的合适量化配置。在最佳情况下,我们的方法相比现有最先进的解决方案实现了高达2.48倍的加速,而精度损失仅为0.086%。