General matrix multiplication (GEMM) is a fundamental operation in deep learning (DL). With DL moving increasingly toward low precision, recent works have proposed novel unary GEMM designs as an alternative to conventional binary GEMM hardware. A rigorous evaluation of recent unary and binary GEMM designs is needed to assess the potential of unary hardware for future DL compute. This paper focuses on unary GEMM designs for integer-based DL inference and performs a detailed evaluation of three latest unary design proposals, namely, uGEMM, tuGEMM and tubGEMM, by comparing them to a conventional binary GEMM. Rigorous post-synthesis evaluations beyond prior works are performed across varying bit-widths and matrix sizes to assess the designs' tradeoffs and determine optimal sweetspots. Further, we perform weight sparsity analysis across eight pretrained convolutional neural networks (CNNs) and the LLaMA2 large language model (LLM). In this work, we demonstrate how unary GEMM can be effectively used for energy-efficient compute in future edge AI accelerators.
翻译:通用矩阵乘法(GEMM)是深度学习(DL)中的基本运算。随着深度学习日益向低精度方向发展,近期研究提出了新型一元GEMM设计,以替代传统的二元GEMM硬件。为评估一元硬件在未来深度学习计算中的潜力,需要对近期的一元和二元GEMM设计进行严格评估。本文聚焦于基于整数推理的一元GEMM设计,通过将三种最新一元设计方案(即uGEMM、tuGEMM和tubGEMM)与传统二元GEMM进行对比,开展了详细评估。研究在不同位宽和矩阵尺寸下进行了超越先前工作的严格综合后评估,以权衡各设计方案并确定最优性能区间。此外,我们在八个预训练卷积神经网络(CNN)和LLaMA2大语言模型(LLM)上进行了权重稀疏性分析。本工作论证了一元GEMM如何能有效应用于未来边缘AI加速器的能效计算。