ProAct: Progressive Training for Hybrid Clipped Activation Function to Enhance Resilience of DNNs

Deep Neural Networks (DNNs) are extensively employed in safety-critical applications where ensuring hardware reliability is a primary concern. To enhance the reliability of DNNs against hardware faults, activation restriction techniques significantly mitigate the fault effects at the DNN structure level, irrespective of accelerator architectures. State-of-the-art methods offer either neuron-wise or layer-wise clipping activation functions. They attempt to determine optimal clipping thresholds using heuristic and learning-based approaches. Layer-wise clipped activation functions cannot preserve DNNs resilience at high bit error rates. On the other hand, neuron-wise clipping activation functions introduce considerable memory overhead due to the addition of parameters, which increases their vulnerability to faults. Moreover, the heuristic-based optimization approach demands numerous fault injections during the search process, resulting in time-consuming threshold identification. On the other hand, learning-based techniques that train thresholds for entire layers concurrently often yield sub-optimal results. In this work, first, we demonstrate that it is not essential to incorporate neuron-wise activation functions throughout all layers in DNNs. Then, we propose a hybrid clipped activation function that integrates neuron-wise and layer-wise methods that apply neuron-wise clipping only in the last layer of DNNs. Additionally, to attain optimal thresholds in the clipping activation function, we introduce ProAct, a progressive training methodology. This approach iteratively trains the thresholds on a layer-by-layer basis, aiming to obtain optimal threshold values in each layer separately.

翻译：深度神经网络（DNNs）被广泛应用于安全关键型应用场景，其中确保硬件可靠性是首要关注点。为提升DNNs抵御硬件故障的可靠性，激活限制技术可在不依赖加速器架构的前提下，于DNN结构层面显著缓解故障影响。现有主流方法分别采用神经元级或层级裁剪激活函数，并尝试通过启发式与基于学习的方法确定最优裁剪阈值。层级裁剪激活函数无法在高误码率条件下保持DNNs的韧性；而神经元级裁剪激活函数因引入额外参数带来显著的内存开销，反而增加了对故障的敏感性。此外，基于启发式的优化方法在搜索过程中需要大量故障注入，导致阈值确定耗时较长；而基于学习的方法虽能同时训练整个层的阈值，但往往得到次优结果。本文首先证明：在DNNs的所有层中均采用神经元级激活函数并非必要。随后提出一种混合裁剪激活函数，将神经元级与层级方法相结合——仅在DNNs的最后一层应用神经元级裁剪。进一步，为获取裁剪激活函数的最优阈值，我们提出ProAct渐进式训练方法。该方法逐层迭代训练阈值，旨在分别获得每个层的最优阈值参数。