In this paper, we present a novel nonlinear programming-based approach to fine-tune pre-trained neural networks to improve robustness against adversarial attacks while maintaining high accuracy on clean data. Our method introduces adversary-correction constraints to ensure correct classification of adversarial data and minimizes changes to the model parameters. We propose an efficient cutting-plane-based algorithm to iteratively solve the large-scale nonconvex optimization problem by approximating the feasible region through polyhedral cuts and balancing between robustness and accuracy. Computational experiments on standard datasets such as MNIST and CIFAR10 demonstrate that the proposed approach significantly improves robustness, even with a very small set of adversarial data, while maintaining minimal impact on accuracy.
翻译:本文提出了一种新颖的基于非线性规划的预训练神经网络微调方法,旨在提升模型对抗对抗攻击的鲁棒性,同时保持其在干净数据上的高精度。该方法通过引入对手校正约束,以确保对抗数据被正确分类,并最小化对模型参数的改动。我们提出了一种高效的基于割平面的算法,通过多面体割近似可行域,并在鲁棒性与精度之间进行权衡,以迭代求解大规模非凸优化问题。在MNIST和CIFAR10等标准数据集上的计算实验表明,即使仅使用极少量的对抗数据,所提方法也能显著提升鲁棒性,同时对精度的影响微乎其微。