Adversarial attacks present a significant threat to modern machine learning systems. Yet, existing detection methods often lack the ability to detect unseen attacks or detect different attack types with a high level of accuracy. In this work, we propose a statistical approach that establishes a detection baseline before a neural network's deployment, enabling effective real-time adversarial detection. We generate a metric of adversarial presence by comparing the behavior of a compressed/uncompressed neural network pair. Our method has been tested against state-of-the-art techniques, and it achieves near-perfect detection across a wide range of attack types. Moreover, it significantly reduces false positives, making it both reliable and practical for real-world applications.
翻译:对抗攻击对现代机器学习系统构成重大威胁。然而,现有检测方法往往无法有效检测未知攻击,或在检测多种攻击类型时难以保持高准确率。本研究提出一种统计方法,通过在神经网络部署前建立检测基线,实现有效的实时对抗检测。我们通过比较压缩/未压缩神经网络对的行为,生成对抗存在的度量指标。该方法已针对最先进技术进行测试,在多种攻击类型上实现了近乎完美的检测效果。此外,该方法显著降低了误报率,使其在实际应用中兼具可靠性与实用性。