Deep Neural Networks (DNNs) are the de facto algorithm for tackling cognitive tasks in real-world applications such as speech recognition and natural language processing. DNN inference comprises numerous dot product operations between inputs and weights that require numerous multiplications and memory accesses, which hinder their performance and energy consumption when evaluated in modern CPUs. In this work, we leverage the high degree of similarity between consecutive inputs in different DNN layers to improve the performance and energy efficiency of DNN inference on CPUs. To this end, we propose ReuseSense, a new hardware scheme that includes ReuseSensor, an engine to efficiently generate the compute and load instructions needed to evaluate a DNN layer accordingly when sensing similar inputs. By intelligently reusing previously computed product values, ReuseSense allows bypassing computations when encountering input values identical to previous ones. Additionally, it efficiently avoids redundant loads by skipping weight loads associated with the bypassed dot product computations. Our experiments show that ReuseSense achieves an 8x speedup in performance and a 74% reduction in total energy consumption across several DNNs on average over the baseline.
翻译:深度神经网络(DNN)是应对语音识别和自然语言处理等实际应用中认知任务的事实标准算法。DNN推理涉及输入与权重之间的大量点积运算,这些运算需要海量乘法与内存访问,在现代CPU上评估时会制约其性能与能耗。本文利用不同DNN层中连续输入之间的高度相似性,以提升CPU上DNN推理的性能与能效。为此,我们提出ReuseSense这一新型硬件方案,其中包含ReuseSensor引擎——该引擎可在感知相似输入时,高效生成评估DNN层所需的计算与加载指令。通过智能复用先前计算出的乘积值,ReuseSense能在遇到与先前相同输入值时绕过计算。此外,它通过跳过与旁路点积计算相关的权重加载操作,有效避免冗余加载。实验表明,与基准方法相比,ReuseSense在多种DNN上的平均性能加速比达8倍,总能耗降低74%。