Robustness-Congruent Adversarial Training for Secure Machine Learning Model Updates

Machine-learning models demand for periodic updates to improve their average accuracy, exploiting novel architectures and additional data. However, a newly-updated model may commit mistakes that the previous model did not make. Such misclassifications are referred to as negative flips, and experienced by users as a regression of performance. In this work, we show that this problem also affects robustness to adversarial examples, thereby hindering the development of secure model update practices. In particular, when updating a model to improve its adversarial robustness, some previously-ineffective adversarial examples may become misclassified, causing a regression in the perceived security of the system. We propose a novel technique, named robustness-congruent adversarial training, to address this issue. It amounts to fine-tuning a model with adversarial training, while constraining it to retain higher robustness on the adversarial examples that were correctly classified before the update. We show that our algorithm and, more generally, learning with non-regression constraints, provides a theoretically-grounded framework to train consistent estimators. Our experiments on robust models for computer vision confirm that (i) both accuracy and robustness, even if improved after model update, can be affected by negative flips, and (ii) our robustness-congruent adversarial training can mitigate the problem, outperforming competing baseline methods.

翻译：机器学习模型需要周期性更新以提升其平均准确率，这通常借助新型架构和额外数据来实现。然而，更新后的模型可能会犯下原模型未曾出现的错误。此类误分类被称为"负翻转"，用户会将其感知为性能退步。本研究揭示，该问题同样会影响对抗样本的鲁棒性，从而阻碍安全模型更新实践的发展。具体而言，当更新模型以增强其对抗鲁棒性时，部分原本无效的对抗样本可能转变为误分类样本，导致系统感知安全性出现退步。我们提出了一种名为"鲁棒一致性对抗训练"的新技术来解决该问题。该技术通过对抗训练对模型进行微调，同时约束模型在更新前已正确分类的对抗样本上保持更高的鲁棒性。研究表明，我们的算法——更广义而言，带非退步约束的学习范式——为训练一致估计量提供了理论依据。我们在计算机视觉鲁棒模型上的实验证实：(i) 准确率与鲁棒性即使在模型更新后得到提升，仍可能受负翻转影响；(ii) 我们的鲁棒一致性对抗训练能有效缓解该问题，性能优于对比基线方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日