In practice, metric analysis on a specific train and test dataset does not guarantee reliable or fair ML models. This is partially due to the fact that obtaining a balanced, diverse, and perfectly labeled dataset is typically expensive, time-consuming, and error-prone. Rather than relying on a carefully designed test set to assess ML models' failures, fairness, or robustness, this paper proposes Semantic Image Attack (SIA), a method based on the adversarial attack that provides semantic adversarial images to allow model diagnosis, interpretability, and robustness. Traditional adversarial training is a popular methodology for robustifying ML models against attacks. However, existing adversarial methods do not combine the two aspects that enable the interpretation and analysis of the model's flaws: semantic traceability and perceptual quality. SIA combines the two features via iterative gradient ascent on a predefined semantic attribute space and the image space. We illustrate the validity of our approach in three scenarios for keypoint detection and classification. (1) Model diagnosis: SIA generates a histogram of attributes that highlights the semantic vulnerability of the ML model (i.e., attributes that make the model fail). (2) Stronger attacks: SIA generates adversarial examples with visually interpretable attributes that lead to higher attack success rates than baseline methods. The adversarial training on SIA improves the transferable robustness across different gradient-based attacks. (3) Robustness to imbalanced datasets: we use SIA to augment the underrepresented classes, which outperforms strong augmentation and re-balancing baselines.
翻译:在实践中,针对特定训练与测试数据集的指标分析并不能保证机器学习(ML)模型的可靠性或公平性。部分原因在于,获取一个平衡、多样且完美标注的数据集通常成本高昂、耗时且易出错。本文不依赖精心设计的测试集来评估ML模型的失败模式、公平性或鲁棒性,而是提出语义图像攻击(Semantic Image Attack,SIA)方法——一种基于对抗攻击的方法,通过生成具有语义特性的对抗图像,实现模型诊断、可解释性与鲁棒性分析。传统对抗训练是增强ML模型抗攻击能力的常用方法,但现有对抗方法未能同时兼顾两个对解释和分析模型缺陷至关重要的方面:语义可追溯性与感知质量。SIA通过预定义语义属性空间与图像空间上的迭代梯度上升,将这两种特性相结合。我们分别在关键点检测与分类任务的三类场景中验证了该方法的有效性:(1)模型诊断:SIA生成属性直方图,突出显示ML模型的语义脆弱性(即导致模型失败的属性);(2)更强攻击:SIA生成的对抗样本具有视觉可解释属性,其攻击成功率高于基线方法。基于SIA的对抗训练能提升跨不同梯度攻击的迁移鲁棒性;(3)非平衡数据集鲁棒性:利用SIA增强样本量不足的类别,其性能优于强数据增强与重平衡基线方法。