Deep neural networks have achieved remarkable success in a wide range of classification tasks. However, they remain highly susceptible to adversarial examples - inputs that are subtly perturbed to induce misclassification while appearing unchanged to humans. Among various attack strategies, Universal Adversarial Perturbations (UAPs) have emerged as a powerful tool for both stress testing model robustness and facilitating scalable adversarial training. Despite their effectiveness, most existing UAP methods neglect domain specific constraints that govern feature relationships. Violating such constraints, such as debt to income ratios in credit scoring or packet flow invariants in network communication, can render adversarial examples implausible or easily detectable, thereby limiting their real world applicability. In this work, we advance universal adversarial attacks to constrained feature spaces by formulating an augmented Lagrangian based min max optimization problem that enforces multiple, potentially complex constraints of varying importance. We propose Constrained Adversarial Perturbation (CAP), an efficient algorithm that solves this problem using a gradient based alternating optimization strategy. We evaluate CAP across diverse domains including finance, IT networks, and cyber physical systems, and demonstrate that it achieves higher attack success rates while significantly reducing runtime compared to existing baselines. Our approach also generalizes seamlessly to individual adversarial perturbations, where we observe similar strong performance gains. Finally, we introduce a principled procedure for learning feature constraints directly from data, enabling broad applicability across domains with structured input spaces.
翻译:深度神经网络在广泛的分类任务中取得了显著成功。然而,它们仍然极易受到对抗样本的攻击——这些输入经过细微扰动以诱导错误分类,同时在人眼看来保持不变。在各种攻击策略中,通用对抗扰动已成为压力测试模型鲁棒性和促进可扩展对抗训练的强大工具。尽管其效果显著,但现有的大多数UAP方法忽略了支配特征关系的领域特定约束。违反此类约束(例如信用评分中的债务收入比或网络通信中的数据包流不变量)可能导致对抗样本变得不可信或易于被检测,从而限制了其在实际应用中的有效性。在本工作中,我们通过构建一个基于增广拉格朗日函数的极小极大优化问题,将通用对抗攻击推广到约束特征空间,该问题强制实施多个可能复杂且重要性各异的约束。我们提出了约束对抗扰动,这是一种高效的算法,采用基于梯度的交替优化策略来求解此问题。我们在包括金融、IT网络和网络物理系统在内的多个领域评估CAP,并证明与现有基线方法相比,它在实现更高攻击成功率的同时显著减少了运行时间。我们的方法也能无缝推广到个体对抗扰动,在此我们观察到类似的强劲性能提升。最后,我们提出了一种从数据中直接学习特征约束的原则性方法,从而使其在具有结构化输入空间的领域中具有广泛的适用性。