Low-Cost Hard-Label Adversarial Attack with Theoretical Foundations

Hard-label black-box attacks, relying solely on top-1 predictions, represent one of the most challenging yet practically threat models. Despite recent progress, existing approaches face two key limitations: (1) they overlook the critical role of initialization, focusing primarily on optimization strategies; and (2) they rely heavily on empirical heuristics without theoretical guarantees. To bridge this gap, we establish a unified theoretical framework showing that existing sign-flipping hard-label attacks can be understood as approximating the true gradient sign. Guided by this principled analysis, we propose a novel attack framework featuring a zero-query initialization strategy and a Pattern-Driven Optimization (PDO) algorithm. We provide theoretical guarantees that our initialization yields higher cosine similarity to the true gradient sign than random baselines, and our PDO module achieves significantly lower query complexity than baseline search methods. Extensive experiments across CIFAR-10, ImageNet, and ObjectNet-covering standard and adversarially trained models, commercial APIs, and CLIP models-demonstrate that our method consistently outperforms SOTA hard-label attacks in both success rate and efficiency, particularly under low query budgets. Furthermore, our method demonstrates robust generalization across corrupted data (ImageNet-C), biomedical images (PathMNIST), and dense prediction tasks such as segmentation. Notably, it bypasses the stateful defense Blacklight, achieving a 0% detection rate.

翻译：硬标签黑盒攻击仅利用top-1预测结果，是最具挑战性但也最贴近实际威胁的模型之一。尽管近期研究取得进展，现有方法仍存在两大关键局限：(1) 忽视初始化阶段的关键作用，过度聚焦于优化策略；(2) 严重依赖经验性启发方法而缺乏理论保障。为填补这一空白，我们构建了统一的理论框架，证明现有符号翻转硬标签攻击可被理解为对真实梯度方向的近似。基于这一原理性分析，我们提出新型攻击框架，包含零查询初始化策略和模式驱动优化(PDO)算法。理论证明：与随机基线相比，我们的初始化方案可获得与真实梯度方向更高的余弦相似度；PDO模块的查询复杂度显著低于基线搜索方法。在CIFAR-10、ImageNet和ObjectNet（涵盖标准训练模型、对抗训练模型、商业API及CLIP模型）上的大量实验表明，本方法在成功率和效率上始终优于现有最优硬标签攻击，尤其在低查询预算场景下优势显著。此外，本方法在受损数据（ImageNet-C）、生物医学图像（PathMNIST）及密集预测任务（如分割）中展现出稳健的泛化能力。值得注意的是，该方法可绕过具备状态保持能力的Blacklight防御系统，实现0%检测率。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《混合威胁、认知战与心理防御：面向情报分析与韧性构建的实践者工具箱》最新50页报告

专知会员服务

31+阅读 · 1月5日

《对齐语言模型的通用和可转移对抗性攻击》CMU等2023最新论文

专知会员服务

26+阅读 · 2024年1月2日

《基于对手网络基础设施发掘来实现自动威胁建模》2023最新79页论文

专知会员服务

33+阅读 · 2023年5月14日

【2023新书】网络安全中的对抗性深度学习:攻击分类，防御机制和学习理论

专知会员服务

52+阅读 · 2023年3月16日