With the rapid advancements of deep learning in recent years, hardware accelerators are continuously deployed in more and more safety-critical applications such as autonomous driving and robotics. While the accelerators are usually fabricated with advanced technology nodes for high performance and energy efficiency, they are also more prone to timing errors under process, voltage, temperature, and aging (PVTA) variations. By revisiting the physical sources of timing errors, we show that most of the timing errors in the accelerator are caused by a specific subset of input patterns, defined as critical input patterns. To improve the timing error resilience of the accelerator, in this paper, we propose READ, a reliability-enhanced accelerator dataflow optimization technique that can effectively reduce timing errors. READ reduces the occurrence of critical input patterns by exploring the optimal computing sequence when mapping a trained deep neural network to accelerators. READ only changes the order of multiply-accumulate operations in a convolution, which introduces negligible hardware overhead and no impact on accuracy. The experimental results on VGG and ResNet demonstrate on average 7.8X timing error rate (TER) reduction and up to 37.9X TER reduction for certain layers. The results also show that READ enables the accelerator to maintain accuracy over a wide range of PVTA variations, making it a promising approach for robust deep-learning design
翻译:随着近年来深度学习的快速发展,硬件加速器被持续部署于越来越多的安全关键型应用,如自动驾驶和机器人领域。尽管加速器通常采用先进工艺节点制造以实现高性能和高能效,但在工艺、电压、温度及老化(PVTA)变化下,它们更容易出现时序错误。通过重新审视时序错误的物理根源,我们发现加速器中的大多数时序错误是由特定输入模式子集(定义为关键输入模式)引发的。为提升加速器的时序错误鲁棒性,本文提出READ——一种可有效降低时序错误的可靠性增强型加速器数据流优化技术。READ通过探索将训练后的深度神经网络映射至加速器时的最优计算序列,减少关键输入模式的出现。该技术仅改变卷积中乘加运算的顺序,由此引入的硬件开销可忽略不计,且对精度无影响。在VGG和ResNet上的实验结果表明,平均时序错误率(TER)降低7.8倍,特定层的TER降低高达37.9倍。结果还显示,READ使加速器能在广泛的PVTA变化范围内保持精度,从而为鲁棒深度学习设计提供了一种前景广阔的方法。