Existing hard-label text attacks often rely on inefficient "outside-in" strategies that traverse vast search spaces. We propose PivotAttack, a query-efficient "inside-out" framework. It employs a Multi-Armed Bandit algorithm to identify Pivot Sets-combinatorial token groups acting as prediction anchors-and strategically perturbs them to induce label flips. This approach captures inter-word dependencies and minimizes query costs. Extensive experiments across traditional models and Large Language Models demonstrate that PivotAttack consistently outperforms state-of-the-art baselines in both Attack Success Rate and query efficiency.
翻译:现有的硬标签文本攻击通常依赖于低效的"由外向内"策略,这些策略需要遍历庞大的搜索空间。我们提出了PivotAttack,一种查询高效的"由内向外"框架。该框架采用多臂老虎机算法来识别枢轴集——作为预测锚点的组合性标记组——并策略性地扰动这些标记组以诱导标签翻转。这种方法能够捕捉词间依赖关系并最小化查询成本。在传统模型和大型语言模型上进行的大量实验表明,PivotAttack在攻击成功率和查询效率两方面均持续优于最先进的基线方法。