Creating high-quality annotated data for task-oriented dialog (ToD) is known to be notoriously difficult, and the challenges are amplified when the goal is to create equitable, culturally adapted, and large-scale ToD datasets for multiple languages. Therefore, the current datasets are still very scarce and suffer from limitations such as translation-based non-native dialogs with translation artefacts, small scale, or lack of cultural adaptation, among others. In this work, we first take stock of the current landscape of multilingual ToD datasets, offering a systematic overview of their properties and limitations. Aiming to reduce all the detected limitations, we then introduce Multi3WOZ, a novel multilingual, multi-domain, multi-parallel ToD dataset. It is large-scale and offers culturally adapted dialogs in 4 languages to enable training and evaluation of multilingual and cross-lingual ToD systems. We describe a complex bottom-up data collection process that yielded the final dataset, and offer the first sets of baseline scores across different ToD-related tasks for future reference, also highlighting its challenging nature.
翻译:众所周知,为任务导向型对话(ToD)创建高质量标注数据极其困难,而旨在为多种语言构建公平、文化适应且大规模ToD数据集时,这一挑战更加凸显。因此,当前数据集仍然非常稀缺,且存在诸多局限,例如基于翻译的非母语对话含有翻译痕迹、规模较小、缺乏文化适应性等。本研究首先梳理了当前多语言ToD数据集的现状,系统概述了其特性与不足。为弥补上述所有缺陷,我们提出了Multi3WOZ——一个新颖的多语言、多领域、多平行ToD数据集。该数据集规模宏大,包含四种语言的文化适应型对话,可用于训练和评估多语言及跨语言ToD系统。我们详细描述了通过复杂自底向上数据收集流程生成最终数据集的过程,并首次提供了不同ToD相关任务的基线得分,以供后续参考,同时凸显了该数据集的挑战性。