Recent works found that deep neural networks (DNNs) can be fooled by adversarial examples, which are crafted by adding adversarial noise on clean inputs. The accuracy of DNNs on adversarial examples will decrease as the magnitude of the adversarial noise increase. In this study, we show that DNNs can be also fooled when the noise is very small under certain circumstances. This new type of attack is called Amplification Trojan Attack (ATAttack). Specifically, we use a trojan network to transform the inputs before sending them to the target DNN. This trojan network serves as an amplifier to amplify the inherent weakness of the target DNN. The target DNN, which is infected by the trojan network, performs normally on clean data while being more vulnerable to adversarial examples. Since it only transforms the inputs, the trojan network can hide in DNN-based pipelines, e.g. by infecting the pre-processing procedure of the inputs before sending them to the DNNs. This new type of threat should be considered in developing safe DNNs.
翻译:近期研究发现,深度神经网络(DNNs)可被对抗样本欺骗——这类样本通过在原始输入上添加对抗性噪声生成。随着对抗性噪声幅度的增加,DNNs在对抗样本上的准确率会下降。本研究表明,在特定条件下,即使噪声极小时DNNs同样可能被欺骗。这种新型攻击方法被称为"放大木马攻击(ATAttack)"。具体而言,我们利用一个木马网络在输入送入目标DNN前对其进行变换。该木马网络作为放大器,用于放大目标DNN的固有弱点。被木马网络感染的目标DNN在处理干净数据时表现正常,但对对抗样本的脆弱性显著增加。由于仅对输入进行变换,木马网络可隐蔽于基于DNN的流水线中——例如通过感染输入在送入DNN前的预处理环节。在开发安全DNN时,需将这种新型威胁纳入考量。