Robust state tracking for task-oriented dialogue systems currently remains restricted to a few popular languages. This paper shows that given a large-scale dialogue data set in one language, we can automatically produce an effective semantic parser for other languages using machine translation. We propose automatic translation of dialogue datasets with alignment to ensure faithful translation of slot values and eliminate costly human supervision used in previous benchmarks. We also propose a new contextual semantic parsing model, which encodes the formal slots and values, and only the last agent and user utterances. We show that the succinct representation reduces the compounding effect of translation errors, without harming the accuracy in practice. We evaluate our approach on several dialogue state tracking benchmarks. On RiSAWOZ, CrossWOZ, CrossWOZ-EN, and MultiWOZ-ZH datasets we improve the state of the art by 11%, 17%, 20%, and 0.3% in joint goal accuracy. We present a comprehensive error analysis for all three datasets showing erroneous annotations can lead to misguided judgments on the quality of the model. Finally, we present RiSAWOZ English and German datasets, created using our translation methodology. On these datasets, accuracy is within 11% of the original showing that high-accuracy multilingual dialogue datasets are possible without relying on expensive human annotations. We release our datasets and software open source.
翻译:鲁棒状态追踪在任务型对话系统中目前仍局限于少数流行语言。本文表明,给定一种语言的大规模对话数据集,我们可以利用机器翻译自动生成其他语言的有效语义解析器。我们提出通过对齐技术自动翻译对话数据集,确保插槽值的忠实翻译,并消除以往基准中昂贵的人工监督。此外,我们提出一种新的上下文语义解析模型,该模型仅对正式插槽和值以及最后一条助智能体和用户话语进行编码。研究表明,这种简洁表示可减少翻译错误的累积效应,且在实践中不损害准确性。我们在多个对话状态追踪基准上评估了该方法。在RiSAWOZ、CrossWOZ、CrossWOZ-EN和MultiWOZ-ZH数据集上,联合目标准确率分别提升了11%、17%、20%和0.3%。我们对所有三个数据集进行了全面错误分析,表明错误标注会导致对模型质量的误判。最后,我们展示了使用翻译方法创建的RiSAWOZ英语和德语数据集。在这些数据集上,准确率仍达到原始数据集的89%以内,表明无需依赖昂贵的人工标注即可实现高精度的多语言对话数据集。我们开源了数据集和软件。