Neural networks have been shown to be vulnerable against fault injection attacks. These attacks change the physical behavior of the device during the computation, resulting in a change of value that is currently being computed. They can be realized by various fault injection techniques, ranging from clock/voltage glitching to application of lasers to rowhammer. In this paper we explore the possibility to reverse engineer neural networks with the usage of fault attacks. SNIFF stands for sign bit flip fault, which enables the reverse engineering by changing the sign of intermediate values. We develop the first exact extraction method on deep-layer feature extractor networks that provably allows the recovery of the model parameters. Our experiments with Keras library show that the precision error for the parameter recovery for the tested networks is less than $10^{-13}$ with the usage of 64-bit floats, which improves the current state of the art by 6 orders of magnitude. Additionally, we discuss the protection techniques against fault injection attacks that can be applied to enhance the fault resistance.
翻译:神经网络已被证明易受故障注入攻击。此类攻击通过改变计算过程中设备的物理行为,导致当前计算值发生变化。攻击可通过多种故障注入技术实现,包括时钟/电压毛刺、激光照射以及行锤攻击等。本文探索了利用故障攻击对神经网络进行逆向工程的可行性。SNIFF(符号位翻转故障)通过改变中间值的符号实现逆向工程。我们首次提出了深度层特征提取网络的精确参数提取方法,该方法可证明地恢复模型参数。基于Keras库的实验表明,在使用64位浮点数时,测试网络的参数恢复精度误差小于$10^{-13}$,较现有最优方法提升了6个数量级。此外,我们讨论了可增强故障抵抗能力的故障注入攻击防护技术。