Many machine learning models are susceptible to adversarial attacks, with decision-based black-box attacks representing the most critical threat in real-world applications. These attacks are extremely stealthy, generating adversarial examples using hard labels obtained from the target machine learning model. This is typically realized by optimizing perturbation directions, guided by decision boundaries identified through query-intensive exact search, significantly limiting the attack success rate. This paper introduces a novel approach using the Approximation Decision Boundary (ADB) to efficiently and accurately compare perturbation directions without precisely determining decision boundaries. The effectiveness of our ADB approach (ADBA) hinges on promptly identifying suitable ADB, ensuring reliable differentiation of all perturbation directions. For this purpose, we analyze the probability distribution of decision boundaries, confirming that using the distribution's median value as ADB can effectively distinguish different perturbation directions, giving rise to the development of the ADBA-md algorithm. ADBA-md only requires four queries on average to differentiate any pair of perturbation directions, which is highly query-efficient. Extensive experiments on six well-known image classifiers clearly demonstrate the superiority of ADBA and ADBA-md over multiple state-of-the-art black-box attacks. The source code is available at https://github.com/BUPTAIOC/ADBA.
翻译:许多机器学习模型易受对抗攻击,其中基于决策的黑盒攻击是现实应用中最具威胁的攻击形式。这类攻击极具隐蔽性,仅利用目标机器学习模型输出的硬标签生成对抗样本。现有方法通常通过优化扰动方向实现攻击,其优化过程依赖查询密集的精确搜索来定位决策边界,这严重制约了攻击成功率。本文提出一种利用近似决策边界(ADB)的新方法,无需精确定位决策边界即可高效准确地比较扰动方向。我们提出的ADB方法(ADBA)的有效性关键在于快速识别合适的ADB,以确保可靠区分所有扰动方向。为此,我们分析了决策边界的概率分布,证实采用分布中值作为ADB可有效区分不同扰动方向,据此开发了ADBA-md算法。ADBA-md平均仅需四次查询即可区分任意扰动方向对,具有极高的查询效率。在六个知名图像分类器上的大量实验表明,ADBA与ADBA-md算法在多个先进黑盒攻击方法中具有显著优势。源代码公开于https://github.com/BUPTAIOC/ADBA。