Deep learning models have been used in creating various effective image classification applications. However, they are vulnerable to adversarial attacks that seek to misguide the models into predicting incorrect classes. Our study of major adversarial attack models shows that they all specifically target and exploit the neural networking structures in their designs. This understanding makes us develop a hypothesis that most classical machine learning models, such as Random Forest (RF), are immune to adversarial attack models because they do not rely on neural network design at all. Our experimental study of classical machine learning models against popular adversarial attacks supports this hypothesis. Based on this hypothesis, we propose a new adversarial-aware deep learning system by using a classical machine learning model as the secondary verification system to complement the primary deep learning model in image classification. Although the secondary classical machine learning model has less accurate output, it is only used for verification purposes, which does not impact the output accuracy of the primary deep learning model, and at the same time, can effectively detect an adversarial attack when a clear mismatch occurs. Our experiments based on CIFAR-100 dataset show that our proposed approach outperforms current state-of-the-art adversarial defense systems.
翻译:深度学习模型已被广泛应用于创建各种高效的图像分类应用。然而,这类模型极易遭受对抗攻击,攻击者通过误导模型预测错误类别。我们研究主流对抗攻击模型后发现,其设计均针对性地利用了神经网络结构。基于这一认识,我们提出假设:多数经典机器学习模型(如随机森林)因完全不依赖神经网络设计,故而具备对抗攻击免疫能力。针对经典机器学习模型与主流对抗攻击的对比实验证实了该假设。据此,我们提出一种新型对抗感知深度学习系统,该系统采用经典机器学习模型作为二次验证子系统,在图像分类任务中辅助主深度学习模型。尽管二次验证的经典机器学习模型输出精度较低,但其仅用于验证目的,既不影响主深度学习模型的输出精度,又能在显著不一致时有效检测对抗攻击。基于CIFAR-100数据集的实验表明,本方法性能优于当前最先进的对抗防御系统。