Black-box query-based attacks constitute significant threats to Machine Learning as a Service (MLaaS) systems since they can generate adversarial examples without accessing the target model's architecture and parameters. Traditional defense mechanisms, such as adversarial training, gradient masking, and input transformations, either impose substantial computational costs or compromise the test accuracy of non-adversarial inputs. To address these challenges, we propose an efficient defense mechanism, PuriDefense, that employs random patch-wise purifications with an ensemble of lightweight purification models at a low level of inference cost. These models leverage the local implicit function and rebuild the natural image manifold. Our theoretical analysis suggests that this approach slows down the convergence of query-based attacks by incorporating randomness into purifications. Extensive experiments on CIFAR-10 and ImageNet validate the effectiveness of our proposed purifier-based defense mechanism, demonstrating significant improvements in robustness against query-based attacks.
翻译:摘要:黑盒查询攻击对机器学习即服务(MLaaS)系统构成重大威胁,因为这类攻击无需访问目标模型的架构和参数即可生成对抗样本。传统的防御机制,例如对抗训练、梯度遮蔽和输入变换,要么引入高昂的计算成本,要么损害非对抗输入的正常测试精度。为应对这些挑战,我们提出了一种高效的防御机制——PuriDefense,该方法以较低的推理成本,通过集成一组轻量级净化模型对输入图像进行随机块状净化。这些模型利用局部隐式函数重建自然图像流形。理论分析表明,该方法通过向净化过程引入随机性,有效减缓了查询攻击的收敛速度。在CIFAR-10和ImageNet数据集上的大量实验验证了所提出的基于净化器的防御机制的有效性,显示出在对抗查询攻击时鲁棒性的显著提升。