Deep neural networks are widely deployed in many fields. Due to the in-situ computation (known as processing in memory) capacity of the Resistive Random Access Memory (ReRAM) crossbar, ReRAM-based accelerator shows potential in accelerating DNN with low power and high performance. However, despite power advantage, such kind of accelerators suffer from the high power consumption of peripheral circuits, especially Analog-to-Digital Converter (ADC), which account for over 60 percent of total power consumption. This problem hinders the ReRAM-based accelerator to achieve higher efficiency. Some redundant Analog-to-Digital conversion operations have no contribution to maintaining inference accuracy, and such operations can be eliminated by modifying the ADC searching logic. Based on such observations, we propose an algorithm-hardware co-design method and explore the co-design approach in both hardware design and quantization algorithms. Firstly, we focus on the distribution output along the crossbar's bit-lines and identify the fine-grained redundant ADC sampling bits. % of weight and To further compress ADC bits, we propose a hardware-friendly quantization method and coding scheme, in which different quantization strategy was applied to the partial results in different intervals. To support the two features above, we propose a lightweight architectural design based on SAR-ADC\@. It's worth mentioning that our method is not only more energy efficient but also retains the flexibility of the algorithm. Experiments demonstrate that our method can reduce about $1.6 \sim 2.3 \times$ ADC power reduction.
翻译:深度神经网络已广泛应用于众多领域。由于阻变随机存取存储器(ReRAM)交叉开关具备原位计算(即存内计算)能力,基于ReRAM的加速器在低功耗和高性能加速深度神经网络方面展现出潜力。然而,尽管具有功耗优势,此类加速器仍面临外围电路(特别是模数转换器ADC)高功耗的制约,其功耗占比超过总功耗的60%。这一问题阻碍了ReRAM加速器实现更高效率。部分冗余的模数转换操作对维持推理精度毫无贡献,可通过修改ADC搜索逻辑予以消除。基于上述观察,我们提出一种算法-硬件协同设计方法,并从硬件设计与量化算法两个维度探索协同设计路径。首先,我们聚焦于交叉开关位线输出的分布特征,识别出细粒度冗余ADC采样比特。为进一步压缩ADC比特数,我们提出一种硬件友好的量化方法与编码方案,针对不同区间的部分结果采用差异化的量化策略。为支持上述两种特性,我们设计了一种基于逐次逼近型ADC的轻量化架构。值得强调的是,本方法不仅具有更高能效,还保留了算法的灵活性。实验表明,本方法可实现约1.6~2.3倍的ADC功耗降低。