The exponential adoption of machine learning (ML) is propelling the world into a future of intelligent automation and data-driven solutions. However, the proliferation of malicious data manipulation attacks against ML, namely adversarial and backdoor attacks, jeopardizes its reliability in safety-critical applications. The existing detection methods against such attacks are built upon assumptions, limiting them in diverse practical scenarios. Thus, motivated by the need for a more robust and unified defense mechanism, we investigate the shared traits of adversarial and backdoor attacks and propose NoiSec that leverages solely the noise, the foundational root cause of such attacks, to detect any malicious data alterations. NoiSec is a reconstruction-based detector that disentangles the noise from the test input, extracts the underlying features from the noise, and leverages them to recognize systematic malicious manipulation. Experimental evaluations conducted on the CIFAR10 dataset demonstrate the efficacy of NoiSec, achieving AUROC scores exceeding 0.954 and 0.852 under white-box and black-box adversarial attacks, respectively, and 0.992 against backdoor attacks. Notably, NoiSec maintains a high detection performance, keeping the false positive rate within only 1\%. Comparative analyses against MagNet-based baselines reveal NoiSec's superior performance across various attack scenarios.
翻译:机器学习(ML)的指数级采用正在推动世界迈向智能自动化和数据驱动解决方案的未来。然而,针对ML的恶意数据操纵攻击(即对抗性攻击和后门攻击)的激增,危及其在安全关键应用中的可靠性。现有的针对此类攻击的检测方法建立在特定假设之上,限制了其在多样化实际场景中的应用。因此,基于对更鲁棒和统一防御机制的需求,我们研究了对抗性攻击和后门攻击的共同特征,并提出了NoiSec,该方法仅利用噪声(此类攻击的根本原因)来检测任何恶意数据篡改。NoiSec是一种基于重构的检测器,它将噪声从测试输入中分离出来,从噪声中提取底层特征,并利用这些特征来识别系统性的恶意操纵。在CIFAR10数据集上进行的实验评估证明了NoiSec的有效性,在白盒和黑盒对抗性攻击下分别实现了超过0.954和0.852的AUROC分数,在后门攻击下达到了0.992。值得注意的是,NoiSec保持了高检测性能,将误报率控制在仅1%以内。与基于MagNet的基线方法的对比分析表明,NoiSec在各种攻击场景下均具有优越性能。