Semantic Image Attack for Visual Model Diagnosis

In practice, metric analysis on a specific train and test dataset does not guarantee reliable or fair ML models. This is partially due to the fact that obtaining a balanced, diverse, and perfectly labeled dataset is typically expensive, time-consuming, and error-prone. Rather than relying on a carefully designed test set to assess ML models' failures, fairness, or robustness, this paper proposes Semantic Image Attack (SIA), a method based on the adversarial attack that provides semantic adversarial images to allow model diagnosis, interpretability, and robustness. Traditional adversarial training is a popular methodology for robustifying ML models against attacks. However, existing adversarial methods do not combine the two aspects that enable the interpretation and analysis of the model's flaws: semantic traceability and perceptual quality. SIA combines the two features via iterative gradient ascent on a predefined semantic attribute space and the image space. We illustrate the validity of our approach in three scenarios for keypoint detection and classification. (1) Model diagnosis: SIA generates a histogram of attributes that highlights the semantic vulnerability of the ML model (i.e., attributes that make the model fail). (2) Stronger attacks: SIA generates adversarial examples with visually interpretable attributes that lead to higher attack success rates than baseline methods. The adversarial training on SIA improves the transferable robustness across different gradient-based attacks. (3) Robustness to imbalanced datasets: we use SIA to augment the underrepresented classes, which outperforms strong augmentation and re-balancing baselines.

翻译：在实践中，针对特定训练与测试数据集的指标分析并不能保证机器学习（ML）模型的可靠性或公平性。部分原因在于，获取一个平衡、多样且完美标注的数据集通常成本高昂、耗时且易出错。本文不依赖精心设计的测试集来评估ML模型的失败模式、公平性或鲁棒性，而是提出语义图像攻击（Semantic Image Attack，SIA）方法——一种基于对抗攻击的方法，通过生成具有语义特性的对抗图像，实现模型诊断、可解释性与鲁棒性分析。传统对抗训练是增强ML模型抗攻击能力的常用方法，但现有对抗方法未能同时兼顾两个对解释和分析模型缺陷至关重要的方面：语义可追溯性与感知质量。SIA通过预定义语义属性空间与图像空间上的迭代梯度上升，将这两种特性相结合。我们分别在关键点检测与分类任务的三类场景中验证了该方法的有效性：（1）模型诊断：SIA生成属性直方图，突出显示ML模型的语义脆弱性（即导致模型失败的属性）；（2）更强攻击：SIA生成的对抗样本具有视觉可解释属性，其攻击成功率高于基线方法。基于SIA的对抗训练能提升跨不同梯度攻击的迁移鲁棒性；（3）非平衡数据集鲁棒性：利用SIA增强样本量不足的类别，其性能优于强数据增强与重平衡基线方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【AAAI 2022】机器学习模型的解释方法效果如何？MIT、微软学者为你解读，Do Feature Attribution Methods Correctly Attribute Features?

专知会员服务

31+阅读 · 2022年3月12日

【CVPR 2022】可转移的稀疏对抗性攻击，Transferable Sparse Adversarial Attack

专知会员服务

15+阅读 · 2022年3月12日

【CVPR 2022】【视频检索用多模态融合Transformer】Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

专知会员服务

29+阅读 · 2022年3月6日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日