Creating effective and reliable task-oriented dialog systems (ToDSs) is challenging, not only because of the complex structure of these systems, but also due to the scarcity of training data, especially when several modules need to be trained separately, each one with its own input/output training examples. Data augmentation (DA), whereby synthetic training examples are added to the training data, has been successful in other NLP systems, but has not been explored as extensively in ToDSs. We empirically evaluate the effectiveness of DA methods in an end-to-end ToDS setting, where a single system is trained to handle all processing stages, from user inputs to system outputs. We experiment with two ToDSs (UBAR, GALAXY) on two datasets (MultiWOZ, KVRET). We consider three types of DA methods (word-level, sentence-level, dialog-level), comparing eight DA methods that have shown promising results in ToDSs and other NLP systems. We show that all DA methods considered are beneficial, and we highlight the best ones, also providing advice to practitioners. We also introduce a more challenging few-shot cross-domain ToDS setting, reaching similar conclusions.
翻译:构建高效可靠的任务型对话系统(ToDSs)具有挑战性,这不仅源于此类系统的复杂结构,更因为训练数据的稀缺性——尤其当多个模块需要分别训练且各自需要独立的输入/输出训练样本时。数据增强(DA)技术通过向训练数据中添加合成样本来扩充数据集,该方法在其他自然语言处理系统中已取得显著成效,但在任务型对话系统中的探索尚不充分。本研究通过实证方法评估数据增强技术在端到端任务型对话系统场景中的有效性,该场景下单一系统需完成从用户输入到系统输出的全流程处理。我们在两个数据集(MultiWOZ, KVRET)上对两种任务型对话系统(UBAR, GALAXY)进行实验,考察了三种类型的数据增强方法(词级、句级、对话级),并比较了八种在任务型对话系统及其他自然语言处理系统中表现优异的数据增强方法。实验表明所有考察的数据增强方法均能带来性能提升,我们进一步甄选出最优方法并为实践者提供应用建议。此外,本文还提出了更具挑战性的少样本跨领域任务型对话系统场景,在该场景下亦得出相似结论。