In real dialogue scenarios, as there are unknown input noises in the utterances, existing supervised slot filling models often perform poorly in practical applications. Even though there are some studies on noise-robust models, these works are only evaluated on rule-based synthetic datasets, which is limiting, making it difficult to promote the research of noise-robust methods. In this paper, we introduce a noise robustness evaluation dataset named Noise-SF for slot filling task. The proposed dataset contains five types of human-annotated noise, and all those noises are exactly existed in real extensive robust-training methods of slot filling into the proposed framework. By conducting exhaustive empirical evaluation experiments on Noise-SF, we find that baseline models have poor performance in robustness evaluation, and the proposed framework can effectively improve the robustness of models. Based on the empirical experimental results, we make some forward-looking suggestions to fuel the research in this direction. Our dataset Noise-SF will be released at https://github.com/dongguanting/Noise-SF.
翻译:在实际对话场景中,由于话语中存在未知的输入噪声,现有的监督式槽填充模型在实际应用中往往表现不佳。尽管已有一些针对噪声鲁棒模型的研究,但这些工作仅在基于规则的合成数据集上进行评估,存在局限性,难以推动噪声鲁棒方法的研究。本文针对槽填充任务,引入了一个名为Noise-SF的噪声鲁棒性评估数据集。该数据集包含五类人工标注的噪声,且所有这些噪声均真实存在于广泛的鲁棒训练方法中,并集成到所提出的框架中。通过在Noise-SF上进行详尽的实证评估实验,我们发现基线模型在鲁棒性评估中表现较差,而所提出的框架能够有效提升模型的鲁棒性。基于实证实验结果,我们提出了一些前瞻性建议,以推动该方向的研究。我们的数据集Noise-SF将在https://github.com/dongguanting/Noise-SF 发布。