Many machine learning models are susceptible to adversarial attacks, with decision-based black-box attacks representing the most critical threat in real-world applications. These attacks are extremely stealthy, generating adversarial examples using hard labels obtained from the target machine learning model. This is typically realized by optimizing perturbation directions, guided by decision boundaries identified through query-intensive exact search, significantly limiting the attack success rate. This paper introduces a novel approach using the Approximation Decision Boundary (ADB) to efficiently and accurately compare perturbation directions without precisely determining decision boundaries. The effectiveness of our ADB approach (ADBA) hinges on promptly identifying suitable ADB, ensuring reliable differentiation of all perturbation directions. For this purpose, we analyze the probability distribution of decision boundaries, confirming that using the distribution's median value as ADB can effectively distinguish different perturbation directions, giving rise to the development of the ADBA-md algorithm. ADBA-md only requires four queries on average to differentiate any pair of perturbation directions, which is highly query-efficient. Extensive experiments on six well-known image classifiers clearly demonstrate the superiority of ADBA and ADBA-md over multiple state-of-the-art black-box attacks.
翻译:许多机器学习模型易受对抗攻击,其中基于决策的黑盒攻击是实际应用中最严重的威胁。这类攻击极具隐蔽性,仅利用从目标机器学习模型获取的硬标签生成对抗样本。现有方法通常通过优化扰动方向实现攻击,其优化过程依赖于通过查询密集的精确搜索所确定的决策边界,这严重限制了攻击成功率。本文提出一种利用近似决策边界(ADB)的新方法,无需精确确定决策边界即可高效、准确地比较不同扰动方向。我们提出的ADB方法(ADBA)的有效性关键在于快速识别合适的ADB,以确保可靠区分所有扰动方向。为此,我们分析了决策边界的概率分布,证实使用该分布的中值作为ADB可有效区分不同扰动方向,由此开发了ADBA-md算法。ADBA-md平均仅需四次查询即可区分任意扰动方向对,具有极高的查询效率。在六个知名图像分类器上的大量实验表明,ADBA与ADBA-md算法明显优于多种先进的黑盒攻击方法。