Query Provenance Analysis for Robust and Efficient Query-based Black-box Attack Defense

Query-based black-box attacks have emerged as a significant threat to machine learning systems, where adversaries can manipulate the input queries to generate adversarial examples that can cause misclassification of the model. To counter these attacks, researchers have proposed Stateful Defense Models (SDMs) for detecting adversarial query sequences and rejecting queries that are "similar" to the history queries. Existing state-of-the-art (SOTA) SDMs (e.g., BlackLight and PIHA) have shown great effectiveness in defending against these attacks. However, recent studies have shown that they are vulnerable to Oracle-guided Adaptive Rejection Sampling (OARS) attacks, which is a stronger adaptive attack strategy. It can be easily integrated with existing attack algorithms to evade the SDMs by generating queries with fine-tuned direction and step size of perturbations utilizing the leaked decision information from the SDMs. In this paper, we propose a novel approach, Query Provenance Analysis (QPA), for more robust and efficient SDMs. QPA encapsulates the historical relationships among queries as the sequence feature to capture the fundamental difference between benign and adversarial query sequences. To utilize the query provenance, we propose an efficient query provenance analysis algorithm with dynamic management. We evaluate QPA compared with two baselines, BlackLight and PIHA, on four widely used datasets with six query-based black-box attack algorithms. The results show that QPA outperforms the baselines in terms of defense effectiveness and efficiency on both non-adaptive and adaptive attacks. Specifically, QPA reduces the Attack Success Rate (ASR) of OARS to 4.08%, comparing to 77.63% and 87.72% for BlackLight and PIHA, respectively. Moreover, QPA also achieves 7.67x and 2.25x higher throughput than BlackLight and PIHA.

翻译：查询式黑盒攻击已成为机器学习系统面临的重大威胁，攻击者可通过操纵输入查询生成对抗样本，导致模型误分类。为应对此类攻击，研究者提出了状态防御模型（SDMs），用于检测对抗性查询序列并拒绝与历史查询"相似"的查询。现有最先进的SDMs（如BlackLight和PIHA）在防御此类攻击方面展现出显著效果。然而，近期研究表明这些模型易受预言机引导的自适应拒绝采样（OARS）攻击——一种更强的自适应攻击策略。该策略能利用SDMs泄露的决策信息，通过微调扰动方向与步长生成查询，轻松与现有攻击算法结合以规避SDMs。本文提出一种新颖的查询溯源分析（QPA）方法，用于构建更鲁棒高效的SDMs。QPA将查询间的历史关联关系封装为序列特征，以捕捉良性查询序列与对抗性查询序列的本质差异。为利用查询溯源信息，我们提出一种支持动态管理的高效查询溯源分析算法。我们在四个广泛使用的数据集上，使用六种查询式黑盒攻击算法，将QPA与BlackLight和PIHA两种基线方法进行对比评估。结果表明，在非自适应与自适应攻击场景下，QPA在防御效能与效率方面均优于基线方法。具体而言，QPA将OARS的攻击成功率（ASR）降至4.08%，而BlackLight和PIHA的ASR分别为77.63%和87.72%。此外，QPA的吞吐量分别达到BlackLight和PIHA的7.67倍和2.25倍。