Recent research has revealed that natural language processing (NLP) models are vulnerable to adversarial examples. However, the current techniques for generating such examples rely on deterministic heuristic rules, which fail to produce optimal adversarial examples. In response, this study proposes a new method called the Fraud's Bargain Attack (FBA), which uses a randomization mechanism to expand the search space and produce high-quality adversarial examples with a higher probability of success. FBA uses the Metropolis-Hasting sampler, a type of Markov Chain Monte Carlo sampler, to improve the selection of adversarial examples from all candidates generated by a customized stochastic process called the Word Manipulation Process (WMP). The WMP method modifies individual words in a contextually-aware manner through insertion, removal, or substitution. Through extensive experiments, this study demonstrates that FBA outperforms other methods in terms of attack success rate, imperceptibility and sentence quality.
翻译:近期研究表明,自然语言处理(NLP)模型易受对抗样本攻击。然而,现有对抗样本生成技术依赖确定性启发式规则,难以产生最优对抗样本。针对此问题,本研究提出一种名为"欺诈交易攻击"(Fraud's Bargain Attack, FBA)的新方法。该方法采用随机化机制扩展搜索空间,以更高成功率生成高质量对抗样本。FBA利用马尔可夫链蒙特卡洛采样器中的Metropolis-Hasting采样器,从定制的随机化过程——单词操作过程(Word Manipulation Process, WMP)生成的所有候选样本中优化选择对抗样本。WMP方法通过插入、删除或替换操作,以上下文感知方式修改单个词语。通过大量实验证明,FBA在攻击成功率、隐蔽性和句子质量方面均优于其他方法。