Deep neural networks have been widely used in various downstream tasks, especially those safety-critical scenario such as autonomous driving, but deep networks are often threatened by adversarial samples. Such adversarial attacks can be invisible to human eyes, but can lead to DNN misclassification, and often exhibits transferability between deep learning and machine learning models and real-world achievability. Adversarial attacks can be divided into white-box attacks, for which the attacker knows the parameters and gradient of the model, and black-box attacks, for the latter, the attacker can only obtain the input and output of the model. In terms of the attacker's purpose, it can be divided into targeted attacks and non-targeted attacks, which means that the attacker wants the model to misclassify the original sample into the specified class, which is more practical, while the non-targeted attack just needs to make the model misclassify the sample. The black box setting is a scenario we will encounter in practice.
翻译:深度神经网络已广泛应用于各类下游任务,尤其是在自动驾驶等安全关键场景中,但深度网络常面临对抗样本的威胁。此类对抗攻击可能对人类视觉不可见,却能导致深度神经网络(DNN)误分类,且通常展现出跨深度学习与机器学习模型的可迁移性以及现实场景的可实现性。对抗攻击可分为白盒攻击(攻击者知晓模型参数与梯度)和黑盒攻击(攻击者仅能获取模型输入与输出)。就攻击者目的而言,可分为目标攻击与非目标攻击:目标攻击要求攻击者使模型将原始样本误分类至指定类别(更具实用性),而非目标攻击仅需使模型对样本产生误分类。黑盒设置是实际场景中常见的情形。