With ever increasing depth and width in deep neural networks to achieve state-of-the-art performance, deep learning computation has significantly grown, and dot-products remain dominant in overall computation time. Most prior works are built on conventional dot-product where weighted input summation is used to represent the neuron operation. However, another implementation of dot-product based on the notion of angles and magnitudes in the Euclidean space has attracted limited attention. This paper proposes DeepCAM, an inference accelerator built on two critical innovations to alleviate the computation time bottleneck of convolutional neural networks. The first innovation is an approximate dot-product built on computations in the Euclidean space that can replace addition and multiplication with simple bit-wise operations. The second innovation is a dynamic size content addressable memory-based (CAM-based) accelerator to perform bit-wise operations and accelerate the CNNs with a lower computation time. Our experiments on benchmark image recognition datasets demonstrate that DeepCAM is up to 523x and 3498x faster than Eyeriss and traditional CPUs like Intel Skylake, respectively. Furthermore, the energy consumed by our DeepCAM approach is 2.16x to 109x less compared to Eyeriss.
翻译:随着深度神经网络为达到最优性能而不断加深加宽,深度学习计算量显著增长,其中点积运算在总体计算时间中仍占主导地位。现有研究大多基于传统点积运算(即加权输入求和来实现神经元操作),然而基于欧氏空间角度与幅度概念的另一类点积实现方式却鲜受关注。本文提出DeepCAM推理加速器,通过两项关键创新缓解卷积神经网络的计算时间瓶颈:第一项创新是基于欧氏空间计算的近似点积,能用简单位运算替代加法与乘法;第二项创新是动态尺寸内容可寻址存储器加速器,通过执行位运算以更短计算时间加速卷积神经网络。我们在基准图像识别数据集上的实验表明,DeepCAM相比Eyeriss和传统CPU(如Intel Skylake)分别提速高达523倍和3498倍。此外,DeepCAM方法消耗的能量相比Eyeriss减少2.16倍至109倍。