Task-oriented dialogue (TOD) systems have been widely deployed in many industries as they deliver more efficient customer support. These systems are typically constructed for a single domain or language and do not generalise well beyond this. To support work on Natural Language Understanding (NLU) in TOD across multiple languages and domains simultaneously, we constructed MULTI3NLU++, a multilingual, multi-intent, multi-domain dataset. MULTI3NLU++ extends the English only NLU++ dataset to include manual translations into a range of high, medium, and low resource languages (Spanish, Marathi, Turkish and Amharic), in two domains (BANKING and HOTELS). Because of its multi-intent property, MULTI3NLU++ represents complex and natural user goals, and therefore allows us to measure the realistic performance of TOD systems in a varied set of the world's languages. We use MULTI3NLU++ to benchmark state-of-the-art multilingual models for the NLU tasks of intent detection and slot labelling for TOD systems in the multilingual setting. The results demonstrate the challenging nature of the dataset, particularly in the low-resource language setting, offering ample room for future experimentation in multi-domain multilingual TOD setups.
翻译:任务型对话系统已在多个行业广泛部署,因其能提供更高效的客户支持。这类系统通常针对单一领域或语言构建,难以有效泛化至其他场景。为支持跨多语言和多领域的任务型对话自然语言理解研究,我们构建了MULTI3NLU++——一个多语言、多意图、多领域数据集。MULTI3NLU++扩展了仅包含英语的NLU++数据集,新增了高资源、中资源和低资源语言(西班牙语、马拉地语、土耳其语和阿姆哈拉语)的人工翻译版本,覆盖银行与酒店两个领域。基于其多意图特性,MULTI3NLU++能够表征复杂且自然的用户目标,从而衡量任务型对话系统在多种世界语言场景下的实际性能。我们利用MULTI3NLU++对当前最先进的多语言模型进行基准测试,评估其在多语言环境下用于任务型对话系统的意图检测与槽位标记任务的表现。结果表明该数据集具有高度挑战性,尤其在低资源语言设定下,为未来多领域多语言任务型对话系统的实验研究提供了充足空间。