Language models based on deep neural networks are vulnerable to textual adversarial attacks. While rich-resource languages like English are receiving focused attention, Tibetan, a cross-border language, is gradually being studied due to its abundant ancient literature and critical language strategy. Currently, there are several Tibetan adversarial text generation methods, but they do not fully consider the textual features of Tibetan script and overestimate the quality of generated adversarial texts. To address this issue, we propose a novel Tibetan adversarial text generation method called TSCheater, which considers the characteristic of Tibetan encoding and the feature that visually similar syllables have similar semantics. This method can also be transferred to other abugidas, such as Devanagari script. We utilize a self-constructed Tibetan syllable visual similarity database called TSVSDB to generate substitution candidates and adopt a greedy algorithm-based scoring mechanism to determine substitution order. After that, we conduct the method on eight victim language models. Experimentally, TSCheater outperforms existing methods in attack effectiveness, perturbation magnitude, semantic similarity, visual similarity, and human acceptance. Finally, we construct the first Tibetan adversarial robustness evaluation benchmark called AdvTS, which is generated by existing methods and proofread by humans.
翻译:基于深度神经网络的语言模型易受文本对抗攻击。尽管英语等资源丰富的语言正受到重点关注,但藏语作为一种跨境语言,因其丰富的古代文献和关键的语言战略地位,正逐渐被研究。目前已有若干藏文对抗文本生成方法,但它们未能充分考虑藏文文字的特征,并高估了所生成对抗文本的质量。为解决此问题,我们提出了一种新颖的藏文对抗文本生成方法TSCheater,该方法考虑了藏文编码特性以及视觉相似音节具有相似语义的特征。此方法亦可迁移至其他元音附标文字,如天城文。我们利用自建的藏语音节视觉相似度数据库TSVSDB来生成替换候选,并采用基于贪心算法的评分机制来确定替换顺序。随后,我们在八个目标语言模型上实施了该方法。实验表明,TSCheater在攻击效果、扰动幅度、语义相似性、视觉相似性及人类接受度方面均优于现有方法。最后,我们构建了首个藏文对抗鲁棒性评估基准AdvTS,该基准由现有方法生成并经人工校对。