Task-oriented dialogue (ToD) systems have been mostly created for high-resource languages, such as English and Chinese. However, there is a need to develop ToD systems for other regional or local languages to broaden their ability to comprehend the dialogue contexts in various languages. This paper introduces IndoToD, an end-to-end multi domain ToD benchmark in Indonesian. We extend two English ToD datasets to Indonesian, comprising four different domains by delexicalization to efficiently reduce the size of annotations. To ensure a high-quality data collection, we hire native speakers to manually translate the dialogues. Along with the original English datasets, these new Indonesian datasets serve as an effective benchmark for evaluating Indonesian and English ToD systems as well as exploring the potential benefits of cross-lingual and bilingual transfer learning approaches.
翻译:任务型对话系统主要面向高资源语言(如英语和中文)构建。然而,为了提升系统理解多种语言对话语境的能力,有必要为其他区域性或本地化语言开发相应系统。本文提出IndoToD——一个基于印尼语的端到端多领域任务型对话基准测试。我们将两个英语任务型对话数据集扩展至印尼语,涵盖四个不同领域,并通过去词汇化处理有效缩减标注规模。为确保数据质量,我们聘请母语者手动翻译对话文本。结合原始英语数据集,这些新增的印尼语数据集可作为评估印尼语与英语任务型对话系统的有效基准,同时探索跨语言及双语迁移学习方法的潜在优势。