Training dialogue systems often entails dealing with noisy training examples and unexpected user inputs. Despite their prevalence, there currently lacks an accurate survey of dialogue noise, nor is there a clear sense of the impact of each noise type on task performance. This paper addresses this gap by first constructing a taxonomy of noise encountered by dialogue systems. In addition, we run a series of experiments to show how different models behave when subjected to varying levels of noise and types of noise. Our results reveal that models are quite robust to label errors commonly tackled by existing denoising algorithms, but that performance suffers from dialogue-specific noise. Driven by these observations, we design a data cleaning algorithm specialized for conversational settings and apply it as a proof-of-concept for targeted dialogue denoising.
翻译:训练对话系统通常需要应对有噪声的训练样本和意外的用户输入。尽管这些噪声普遍存在,但目前尚缺乏对对话噪声的精确调查,也未能清晰了解每种噪声类型对任务性能的影响。本文通过首先构建对话系统遇到的噪声分类体系来解决这一空白。此外,我们进行了一系列实验,展示不同模型在不同噪声级别和类型下的表现。我们的结果表明,模型对现有去噪算法通常处理的标签噪声具有相当强的鲁棒性,但对话特定噪声会显著损害性能。受这些观察启发,我们设计了一种专门针对对话场景的数据清洗算法,并将其作为针对性对话去噪的概念验证应用。