Neural text detectors aim to decide the characteristics that distinguish neural (machine-generated) from human texts. To challenge such detectors, adversarial attacks can alter the statistical characteristics of the generated text, making the detection task more and more difficult. Inspired by the advances of mutation analysis in software development and testing, in this paper, we propose character- and word-based mutation operators for generating adversarial samples to attack state-of-the-art natural text detectors. This falls under white-box adversarial attacks. In such attacks, attackers have access to the original text and create mutation instances based on this original text. The ultimate goal is to confuse machine learning models and classifiers and decrease their prediction accuracy.
翻译:神经文本检测器旨在判别区分神经生成文本(机器生成)与人类文本的特征。为挑战此类检测器,对抗攻击可通过改变生成文本的统计特性,使检测任务日益困难。受软件开发和测试中突变分析进展的启发,本文提出基于字符级与词级的突变算子,用于生成对抗样本以攻击当前最先进的自然文本检测器。这属于白盒对抗攻击范畴。在此类攻击中,攻击者可获取原始文本,并基于该原始文本创建突变实例。其最终目标是混淆机器学习模型与分类器,降低其预测准确率。