With the advent of deep learning methods, Neural Machine Translation (NMT) systems have become increasingly powerful. However, deep learning based systems are susceptible to adversarial attacks, where imperceptible changes to the input can cause undesirable changes at the output of the system. To date there has been little work investigating adversarial attacks on sequence-to-sequence systems, such as NMT models. Previous work in NMT has examined attacks with the aim of introducing target phrases in the output sequence. In this work, adversarial attacks for NMT systems are explored from an output perception perspective. Thus the aim of an attack is to change the perception of the output sequence, without altering the perception of the input sequence. For example, an adversary may distort the sentiment of translated reviews to have an exaggerated positive sentiment. In practice it is challenging to run extensive human perception experiments, so a proxy deep-learning classifier applied to the NMT output is used to measure perception changes. Experiments demonstrate that the sentiment perception of NMT systems' output sequences can be changed significantly with small imperceptible changes to input sequences.
翻译:随着深度学习方法的出现,神经机器翻译(NMT)系统日益强大。然而,基于深度学习的方法容易受到对抗攻击的影响,即对输入进行难以察觉的改动可能导致系统输出产生不良变化。迄今为止,针对序列到序列系统(如NMT模型)的对抗攻击研究鲜有涉及。以往NMT领域的研究主要关注以在输出序列中引入目标短语为目标的攻击。本文从输出感知的角度探索了NMT系统的对抗攻击。因此,攻击的目标是在不改变输入序列感知的情况下改变输出序列的感知。例如,攻击者可能扭曲翻译评论的情感,使其呈现夸大的积极情绪。实际上,进行大规模人类感知实验具有挑战性,因此我们采用应用于NMT输出的代理深度学习分类器来衡量感知变化。实验表明,通过对输入序列进行微小且难以察觉的改动,可以显著改变NMT系统输出序列的情感感知。