A key component of modern conversational systems is the Dialogue State Tracker (or DST), which models a user's goals and needs. Toward building more robust and reliable DSTs, we introduce a prompt-based learning approach to automatically generate effective adversarial examples to probe DST models. Two key characteristics of this approach are: (i) it only needs the output of the DST with no need for model parameters, and (ii) it can learn to generate natural language utterances that can target any DST. Through experiments over state-of-the-art DSTs, the proposed framework leads to the greatest reduction in accuracy and the best attack success rate while maintaining good fluency and a low perturbation ratio. We also show how much the generated adversarial examples can bolster a DST through adversarial training. These results indicate the strength of prompt-based attacks on DSTs and leave open avenues for continued refinement.
翻译:现代对话系统的关键组成部分是对话状态追踪器(DST),它用于建模用户的目标和需求。为构建更鲁棒、更可靠的DST,我们引入了一种基于提示的学习方法,可自动生成有效的对抗性示例以探询DST模型。该方法具有两个关键特征:(i)仅需DST的输出,无需模型参数;(ii)可学习生成能够针对任意DST的自然语言表达。在最新DST模型上的实验表明,所提出的框架在保持良好流畅性和低扰动率的前提下,实现了最大的准确率降幅和最佳的攻击成功率。我们还展示了生成的对抗性示例如何通过对抗训练增强DST的性能。这些结果揭示了基于提示的攻击对DST的强大效力,并为进一步改进留下了研究空间。