DNN-based language models perform excellently on various tasks, but even SOTA LLMs are susceptible to textual adversarial attacks. Adversarial texts play crucial roles in multiple subfields of NLP. However, current research has the following issues. (1) Most textual adversarial attack methods target rich-resourced languages. How do we generate adversarial texts for less-studied languages? (2) Most textual adversarial attack methods are prone to generating invalid or ambiguous adversarial texts. How do we construct high-quality adversarial robustness benchmarks? (3) New language models may be immune to part of previously generated adversarial texts. How do we update adversarial robustness benchmarks? To address the above issues, we introduce HITL-GAT, a system based on a general approach to human-in-the-loop generation of adversarial texts. HITL-GAT contains four stages in one pipeline: victim model construction, adversarial example generation, high-quality benchmark construction, and adversarial robustness evaluation. Additionally, we utilize HITL-GAT to make a case study on Tibetan script which can be a reference for the adversarial research of other less-studied languages.
翻译:基于深度神经网络的语言模型在各种任务上表现出色,但即使是当前最先进的大语言模型也容易受到文本对抗攻击的影响。对抗文本在自然语言处理的多个子领域中扮演着关键角色。然而,现有研究存在以下问题:(1) 大多数文本对抗攻击方法针对资源丰富的语言。我们如何为研究较少的语言生成对抗文本?(2) 大多数文本对抗攻击方法容易生成无效或语义模糊的对抗文本。我们如何构建高质量的对抗鲁棒性基准?(3) 新的语言模型可能对先前生成的部分对抗文本具有免疫力。我们如何更新对抗鲁棒性基准?为解决上述问题,我们提出了HITL-GAT系统,该系统基于一种通用的人机协同对抗文本生成方法。HITL-GAT在一个流程中包含四个阶段:受害者模型构建、对抗样本生成、高质量基准构建以及对抗鲁棒性评估。此外,我们利用HITL-GAT对藏文进行了案例研究,这可为其他研究较少语言的对抗研究提供参考。